Re: Review Request: HIVE-2036: Update bitmap indexes for automatic usage

2011-06-10 Thread Syed Albiz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/857/
---

(Updated 2011-06-10 06:35:32.125295)


Review request for hive and John Sichi.


Changes
---

Based on a discussion with yongqian, I re-implemented the predicate 
decomposition into two steps, computing the overall residual predicate from the 
union of all columns in the available indexes, and then computing the 
predicates to apply to each index individually. Additionally I have also 
extended the functionality to pass in partition columns to allowColumnNames and 
added/extended the testcases to check that partition predicates are propagated 
correctly. This required adding a check in IndexWhereProcessor.java that the 
correct FilterOperator was passed to the process(...) method (apparently a 
duplicate FilterOperator that does not have the entire predicate gets created).


Summary
---

Add support for generating index queries to support automatic usage of bitmap 
indexes. This required changing the interface to the IndexHandlers to support 
accepting queries on multiple indexes. The compact indexes were modified to use 
this new interface as well, although no functional changes were made to how 
they work. Only supports AND predicates right now, but it should be possibly to 
extend the BitmapQuery interface defined in this patch to easily support OR 
predicates as well. Currently benchmarking these changes on a test cluster.


This addresses bug HIVE-2036.
https://issues.apache.org/jira/browse/HIVE-2036


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java e5ee183 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
af9d7b1 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapInnerQuery.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapOuterQuery.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapQuery.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
56e7609 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java d64e88b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 268560d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 0873e1a 
  ql/src/test/queries/clientpositive/index_auto_partitioned.q 5f92f04 
  ql/src/test/queries/clientpositive/index_bitmap_auto.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap_auto_partitioned.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_partitioned.q.out 05cc84a 
  ql/src/test/results/clientpositive/index_bitmap_auto.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/857/diff


Testing
---

Passes unit tests, additional testcase to test automatic bitmap indexing 
index_bitmap_auto.q was also added to the TestCliDriver suite. Currently 
benchmarking changes on a test cluster.


Thanks,

Syed



[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage

2011-06-10 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047029#comment-13047029
 ] 

jirapos...@reviews.apache.org commented on HIVE-2036:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/857/
---

(Updated 2011-06-10 06:35:32.125295)


Review request for hive and John Sichi.


Changes
---

Based on a discussion with yongqian, I re-implemented the predicate 
decomposition into two steps, computing the overall residual predicate from the 
union of all columns in the available indexes, and then computing the 
predicates to apply to each index individually. Additionally I have also 
extended the functionality to pass in partition columns to allowColumnNames and 
added/extended the testcases to check that partition predicates are propagated 
correctly. This required adding a check in IndexWhereProcessor.java that the 
correct FilterOperator was passed to the process(...) method (apparently a 
duplicate FilterOperator that does not have the entire predicate gets created).


Summary
---

Add support for generating index queries to support automatic usage of bitmap 
indexes. This required changing the interface to the IndexHandlers to support 
accepting queries on multiple indexes. The compact indexes were modified to use 
this new interface as well, although no functional changes were made to how 
they work. Only supports AND predicates right now, but it should be possibly to 
extend the BitmapQuery interface defined in this patch to easily support OR 
predicates as well. Currently benchmarking these changes on a test cluster.


This addresses bug HIVE-2036.
https://issues.apache.org/jira/browse/HIVE-2036


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java e5ee183 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
af9d7b1 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapInnerQuery.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapOuterQuery.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapQuery.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
56e7609 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java d64e88b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 268560d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 0873e1a 
  ql/src/test/queries/clientpositive/index_auto_partitioned.q 5f92f04 
  ql/src/test/queries/clientpositive/index_bitmap_auto.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap_auto_partitioned.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_partitioned.q.out 05cc84a 
  ql/src/test/results/clientpositive/index_bitmap_auto.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/857/diff


Testing
---

Passes unit tests, additional testcase to test automatic bitmap indexing 
index_bitmap_auto.q was also added to the TestCliDriver suite. Currently 
benchmarking changes on a test cluster.


Thanks,

Syed



 Update bitmap indexes for automatic usage
 -

 Key: HIVE-2036
 URL: https://issues.apache.org/jira/browse/HIVE-2036
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2036.1.patch


 HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap 
 index support.  The bitmap code will need to be extended after it is 
 committed to enable automatic use of indexing.  Most work will be focused in 
 the BitmapIndexHandler, which needs to generate the re-entrant QL index 
 query.  There may also be significant work in the IndexPredicateAnalyzer to 
 support predicates with OR's, instead of just AND's as it is currently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-10 Thread Sohan Jain (JIRA)
Optimize get_partition_names_ps()
-

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain


If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database.  This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-10 Thread Sohan Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2213:
-

Attachment: HIVE-2213.1.patch

 Optimize get_partition_names_ps()
 -

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2213.1.patch


 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database.  This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-10 Thread Sohan Jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
---

Review request for hive and Paul Yang.


Summary
---

If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database. This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
1134205 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 1134205 

Diff: https://reviews.apache.org/r/878/diff


Testing
---

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



[jira] [Commented] (HIVE-243) ^C breaks out of running query, but not whole CLI

2011-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047043#comment-13047043
 ] 

Hudson commented on HIVE-243:
-

Integrated in Hive-trunk-h0.21 #771 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/771/])
HIVE-2211. Fix a bug caused by HIVE-243 (Siying Dong via Ning Zhang)

nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1134179
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java


 ^C breaks out of running query, but not whole CLI
 -

 Key: HIVE-243
 URL: https://issues.apache.org/jira/browse/HIVE-243
 Project: Hive
  Issue Type: Wish
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Adam Kramer
Assignee: George Djabarov
 Fix For: 0.8.0

 Attachments: HIVE-243.patch


 It would be lovely if, when I know a query is bad, I could just ^C out of it. 
 I can do that now, but the whole CLI quits.
 It'd be quite nice if it took an extra ^C to break the CLI, or if there was 
 some control character to break out of a query without breaking out of the 
 CLI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-10 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047041#comment-13047041
 ] 

jirapos...@reviews.apache.org commented on HIVE-2213:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
---

Review request for hive and Paul Yang.


Summary
---

If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database. This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
1134205 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 1134205 

Diff: https://reviews.apache.org/r/878/diff


Testing
---

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



 Optimize get_partition_names_ps()
 -

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2213.1.patch


 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database.  This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2208) create a new API in Warehouse where the root directory is specified

2011-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047042#comment-13047042
 ] 

Hudson commented on HIVE-2208:
--

Integrated in Hive-trunk-h0.21 #771 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/771/])


 create a new API in Warehouse where the root directory is specified
 ---

 Key: HIVE-2208
 URL: https://issues.apache.org/jira/browse/HIVE-2208
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.2208.1.patch


 It would be useful to create tables in multiple DFS's

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Assignee: Siying Dong
 Summary: reduce name node calls in hive by creating temporary directories  
(was: remove name node calls in hive by creating temporary directories)

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong

 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.1.patch

Implemented the logic.
Discovered one problem: when moving from /tmp1/_tmp_1 to /tmp2/1, we might need 
to check whether /tmp2 exists before moving it. This patch avoids this call by 
pre-create the temp directory before submitting the job. However, we cannot do 
that for dynamic partitioning as we don't know the directory names. So for 
dynamic partitioning, we have some extra costs added for DFS namenode read. So 
far I think this tradeoff is worthwhile. Potentially this cost can be reduced 
it by caching directories created. We can try that approach as a followup.

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Status: Patch Available  (was: In Progress)

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: (was: HIVE-2201.1.patch)

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-2201 started by Siying Dong.

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.1.patch

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2209) Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object

2011-06-10 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2209:


Attachment: HIVE-2209v0.patch

Patch, with tests, added.

 Provide a way by which ObjectInspectorUtils.compare can be extended by the 
 caller for comparing maps which are part of the object
 -

 Key: HIVE-2209
 URL: https://issues.apache.org/jira/browse/HIVE-2209
 Project: Hive
  Issue Type: Improvement
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE-2209v0.patch


 Now ObjectInspectorUtils.compare throws an exception if a map is contained 
 (recursively) within the objects being compared. Two obvious implementations 
 are
 - a simple map comparer which assumes keys of the first map can be used to 
 fetch values from the second
 - a 'cross-product' comparer which compares every pair of key-value pairs in 
 the two maps, and calls a match if and only if all pairs are matched
 Note that it would be difficult to provide a transitive 
 greater-than/less-than indication with maps so that is not in scope. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2188) Add get_table_objects_by_name() to Hive MetaStore

2011-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047150#comment-13047150
 ] 

Hudson commented on HIVE-2188:
--

Integrated in Hive-trunk-h0.21 #772 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/772/])
HIVE-2188. Add get_table_objects_by_name() to Hive MetaStore (Sohan Jain 
via cws)

cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1134183
Files : 
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
* /hive/trunk/metastore/if/hive_metastore.thrift
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
* 
/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php
* /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb


 Add get_table_objects_by_name() to Hive MetaStore
 -

 Key: HIVE-2188
 URL: https://issues.apache.org/jira/browse/HIVE-2188
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2188.1.patch, HIVE-2188.3.patch


 This function would get multiple tables from the hive metastore as opposed to 
 just one at a time, saving round trip time to the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2209) Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object

2011-06-10 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2209:


Status: Patch Available  (was: Open)

For review by He Yongqiang

 Provide a way by which ObjectInspectorUtils.compare can be extended by the 
 caller for comparing maps which are part of the object
 -

 Key: HIVE-2209
 URL: https://issues.apache.org/jira/browse/HIVE-2209
 Project: Hive
  Issue Type: Improvement
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE-2209v0.patch


 Now ObjectInspectorUtils.compare throws an exception if a map is contained 
 (recursively) within the objects being compared. Two obvious implementations 
 are
 - a simple map comparer which assumes keys of the first map can be used to 
 fetch values from the second
 - a 'cross-product' comparer which compares every pair of key-value pairs in 
 the two maps, and calls a match if and only if all pairs are matched
 Note that it would be difficult to provide a transitive 
 greater-than/less-than indication with maps so that is not in scope. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations

2011-06-10 Thread Krishna

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/879/
---

Review request for hive and Yongqiang He.


Summary
---

Patch for HIVE-2209


Diffs
-

  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualcomparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualcomparer.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/879/diff


Testing
---

Tests added


Thanks,

Krishna



[jira] [Updated] (HIVE-2036) Update bitmap indexes for automatic usage

2011-06-10 Thread Syed S. Albiz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed S. Albiz updated HIVE-2036:


Attachment: HIVE-2036.3.patch

This patch is still WIP, there are a couple of issues I know still need 
correcting. In particular, the index_auto_unused.q testcase fails, since I 
updated the partition predicates to propagate properly, there was no check to 
make sure that the index was built on the partition being queried (but the 
testcase would still pass since partition predicates weren't propagated anyway)

I probably also want to refactor the logic in IndexWhereProcessor before this 
is ready.

 Update bitmap indexes for automatic usage
 -

 Key: HIVE-2036
 URL: https://issues.apache.org/jira/browse/HIVE-2036
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Syed S. Albiz
 Attachments: HIVE-2036.1.patch, HIVE-2036.3.patch


 HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap 
 index support.  The bitmap code will need to be extended after it is 
 committed to enable automatic use of indexing.  Most work will be focused in 
 the BitmapIndexHandler, which needs to generate the re-entrant QL index 
 query.  There may also be significant work in the IndexPredicateAnalyzer to 
 support predicates with OR's, instead of just AND's as it is currently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-branch-0.7.1-h0.21 #19

2011-06-10 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/19/

--
[...truncated 27383 lines...]
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-06-10_12-47-51_236_3118745972910338142/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] 2011-06-10 12:47:54,280 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-06-10_12-47-51_236_3118745972910338142/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201106101247_1556750958.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-06-10_12-47-56_568_7709338130334341560/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-06-10_12-47-56_568_7709338130334341560/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 

[jira] [Assigned] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-10 Thread Franklin Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklin Hu reassigned HIVE-2035:
-

Assignee: Franklin Hu

 Use block-level merge for RCFile if merging intermediate results are needed
 ---

 Key: HIVE-2035
 URL: https://issues.apache.org/jira/browse/HIVE-2035
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Franklin Hu

 Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
 the intermediate data could be merged using an additional MapReduce job. This 
 could be quite expensive if the data size is large. With HIVE-1950, merging 
 can be done in the RCFile block level so that it bypasses the 
 (de-)compression, (de-)serialization phases. This could improve the merge 
 process significantly. 
 This JIRA should handle the case where the input table is not stored in 
 RCFile, but the destination table is (which requires the intermediate data 
 should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-10 Thread Franklin Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklin Hu updated HIVE-2035:
--

Attachment: hive-2035.1.patch

Implements block level merge of intermediate results to a table or partition 
stored as RCFile.

 Use block-level merge for RCFile if merging intermediate results are needed
 ---

 Key: HIVE-2035
 URL: https://issues.apache.org/jira/browse/HIVE-2035
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Franklin Hu
 Attachments: hive-2035.1.patch


 Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
 the intermediate data could be merged using an additional MapReduce job. This 
 could be quite expensive if the data size is large. With HIVE-1950, merging 
 can be done in the RCFile block level so that it bypasses the 
 (de-)compression, (de-)serialization phases. This could improve the merge 
 process significantly. 
 This JIRA should handle the case where the input table is not stored in 
 RCFile, but the destination table is (which requires the intermediate data 
 should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-10 Thread Paul Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review804
---


You can do this here or in a separate JIRA, but can you update 
get_partitions_ps() using a similar technique?


trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
https://reviews.apache.org/r/878/#comment1753

Can you refactor with the above function since they are similar?



trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
https://reviews.apache.org/r/878/#comment1754

Same here



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/878/#comment1755

To be consistent with the other method, maybe call this 
listPartitionNamesPs?



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
https://reviews.apache.org/r/878/#comment1756

Combine with above


- Paul


On 2011-06-10 07:05:56, Sohan Jain wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/878/
 ---
 
 (Updated 2011-06-10 07:05:56)
 
 
 Review request for hive and Paul Yang.
 
 
 Summary
 ---
 
 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database. This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.
 
 
 This addresses bug HIVE-2213.
 https://issues.apache.org/jira/browse/HIVE-2213
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 1134205 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1134205 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
 1134205 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
 1134205 
   
 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
  1134205 
 
 Diff: https://reviews.apache.org/r/878/diff
 
 
 Testing
 ---
 
 Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
 
 
 Thanks,
 
 Sohan
 




[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-10 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047452#comment-13047452
 ] 

jirapos...@reviews.apache.org commented on HIVE-2213:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review804
---


You can do this here or in a separate JIRA, but can you update 
get_partitions_ps() using a similar technique?


trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
https://reviews.apache.org/r/878/#comment1753

Can you refactor with the above function since they are similar?



trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
https://reviews.apache.org/r/878/#comment1754

Same here



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/878/#comment1755

To be consistent with the other method, maybe call this 
listPartitionNamesPs?



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
https://reviews.apache.org/r/878/#comment1756

Combine with above


- Paul


On 2011-06-10 07:05:56, Sohan Jain wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/878/
bq.  ---
bq.  
bq.  (Updated 2011-06-10 07:05:56)
bq.  
bq.  
bq.  Review request for hive and Paul Yang.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  If a table has a large number of partitions, get_partition_names_ps() make 
take a long time to execute, because we get all of the partition names from the 
database. This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.
bq.  
bq.  
bq.  This addresses bug HIVE-2213.
bq.  https://issues.apache.org/jira/browse/HIVE-2213
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 
1134205 
bq.
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1134205 
bq.
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1134205 
bq.trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1134205 
bq.trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
1134205 
bq.
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 1134205 
bq.  
bq.  Diff: https://reviews.apache.org/r/878/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Passes previous test cases for get_partition_names_ps() in 
TestHiveMetaStore.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Sohan
bq.  
bq.



 Optimize get_partition_names_ps()
 -

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2213.1.patch


 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database.  This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2215) Add api for marking / querying set of partitions for events

2011-06-10 Thread Ashutosh Chauhan (JIRA)
Add api for marking / querying set of partitions for events
---

 Key: HIVE-2215
 URL: https://issues.apache.org/jira/browse/HIVE-2215
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2215) Add api for marking / querying set of partitions for events

2011-06-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-2215:
---

Attachment: hive_2215.patch

Patch including generated code. Will post on RB without generated code. 
Incorporates feedback from John on HIVE-2147

 Add api for marking / querying set of partitions for events
 ---

 Key: HIVE-2215
 URL: https://issues.apache.org/jira/browse/HIVE-2215
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: hive_2215.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2215) Add api for marking / querying set of partitions for events

2011-06-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-2215:
---

Status: Patch Available  (was: Open)

This patch is ready for review.

 Add api for marking / querying set of partitions for events
 ---

 Key: HIVE-2215
 URL: https://issues.apache.org/jira/browse/HIVE-2215
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: hive_2215.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2215) Add api for marking / querying set of partitions for events

2011-06-10 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047466#comment-13047466
 ] 

jirapos...@reviews.apache.org commented on HIVE-2215:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/883/
---

Review request for hive and John Sichi.


Summary
---

Follow-up for HIVE-2147.


This addresses bug HIVE-2215.
https://issues.apache.org/jira/browse/HIVE-2215


Diffs
-

  trunk/metastore/if/hive_metastore.thrift 1134443 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java
 1134443 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1134443 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MarkPartitionEvent.java
 PRE-CREATION 
  
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java
 PRE-CREATION 
  trunk/metastore/src/model/package.jdo 1134443 
  trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 
1134443 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionSet.java
 PRE-CREATION 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java
 1134443 

Diff: https://reviews.apache.org/r/883/diff


Testing
---

Added test cases for new api.


Thanks,

Ashutosh



 Add api for marking / querying set of partitions for events
 ---

 Key: HIVE-2215
 URL: https://issues.apache.org/jira/browse/HIVE-2215
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: hive_2215.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2147) Add api to send / receive message to metastore

2011-06-10 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047468#comment-13047468
 ] 

Paul Yang commented on HIVE-2147:
-

I agree with John's suggestion for PARTITION_EVENTS. For this event table, when 
will rows be dropped? Also, for when partitions are represented using a string, 
we've followed the convention that they are called partition names. Can we use 
that for MPartitionSet?

Since MPartitionSet.partVals is a string, we should make it indexed, much like 
partitionName for the PARTITION table.

 Add api to send / receive message to metastore
 --

 Key: HIVE-2147
 URL: https://issues.apache.org/jira/browse/HIVE-2147
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: api-without-thrift.patch, hive_2147-2.patch


 This is follow-up work on HIVE-2038.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2147) Add api to send / receive message to metastore

2011-06-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-2147:
---

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

As suggested, HIVE-2215 has been opened for this.

 Add api to send / receive message to metastore
 --

 Key: HIVE-2147
 URL: https://issues.apache.org/jira/browse/HIVE-2147
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: api-without-thrift.patch, hive_2147-2.patch


 This is follow-up work on HIVE-2038.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2215

2011-06-10 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/883/
---

Review request for hive and John Sichi.


Summary
---

Follow-up for HIVE-2147.


This addresses bug HIVE-2215.
https://issues.apache.org/jira/browse/HIVE-2215


Diffs
-

  trunk/metastore/if/hive_metastore.thrift 1134443 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java
 1134443 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1134443 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1134443 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MarkPartitionEvent.java
 PRE-CREATION 
  
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java
 PRE-CREATION 
  trunk/metastore/src/model/package.jdo 1134443 
  trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 
1134443 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionSet.java
 PRE-CREATION 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java
 1134443 

Diff: https://reviews.apache.org/r/883/diff


Testing
---

Added test cases for new api.


Thanks,

Ashutosh



[jira] [Commented] (HIVE-2215) Add api for marking / querying set of partitions for events

2011-06-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047506#comment-13047506
 ] 

Ashutosh Chauhan commented on HIVE-2215:


Replying to Paul's comments since I closed HIVE-2147 :

bq. I agree with John's suggestion for PARTITION_EVENTS. For this event table, 
when will rows be dropped?
This also needs to be considered. I will prefer to do it in a followup jira to 
keep this one manageable.

bq. Also, for when partitions are represented using a string, we've followed 
the convention that they are called partition names. Can we use that for 
MPartitionSet?
Yup, I can rename that.

bq. Since MPartitionSet.partVals is a string, we should make it indexed, much 
like partitionName for the PARTITION table.
In the latest patch, I have made it indexed.

If you can take a look at the latest patch, that will be great.

 Add api for marking / querying set of partitions for events
 ---

 Key: HIVE-2215
 URL: https://issues.apache.org/jira/browse/HIVE-2215
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: hive_2215.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1537) Allow users to specify LOCATION in CREATE DATABASE statement

2011-06-10 Thread Bob Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047772#comment-13047772
 ] 

Bob Liu commented on HIVE-1537:
---

Any idea as to when this feature will get implemented?


 Allow users to specify LOCATION in CREATE DATABASE statement
 

 Key: HIVE-1537
 URL: https://issues.apache.org/jira/browse/HIVE-1537
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Thiruvel Thirumoolan
 Attachments: hive-1537.metastore.part.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-10 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.2.patch

fix a bug.

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira