Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-17 Thread Sohan Jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
---

(Updated 2011-06-17 21:22:00.028428)


Review request for hive and Paul Yang.


Changes
---

- made getPartitionPsQueryResults() return a parameterized type to avoid lots 
of casting


Summary
---

If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database. This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1136751 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
1136751 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 1136751 

Diff: https://reviews.apache.org/r/878/diff


Testing
---

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-16 Thread Paul Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review853
---



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/878/#comment1862

Line exceeds 100 char limit


- Paul


On 2011-06-13 21:11:38, Sohan Jain wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/878/
 ---
 
 (Updated 2011-06-13 21:11:38)
 
 
 Review request for hive and Paul Yang.
 
 
 Summary
 ---
 
 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database. This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.
 
 
 This addresses bug HIVE-2213.
 https://issues.apache.org/jira/browse/HIVE-2213
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 1135227 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1135227 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
 1135227 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
 1135227 
   
 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
  1135227 
 
 Diff: https://reviews.apache.org/r/878/diff
 
 
 Testing
 ---
 
 Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
 
 
 Thanks,
 
 Sohan
 




Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-16 Thread Sohan Jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
---

(Updated 2011-06-16 23:30:02.425588)


Review request for hive and Paul Yang.


Changes
---

-Fixed line that exceeded 100 chars


Summary
---

If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database. This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
1135227 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 1135227 

Diff: https://reviews.apache.org/r/878/diff


Testing
---

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-16 Thread Paul Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review858
---



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/878/#comment1877

Can we make this method parameterized to reduce the number of casts 
required? E.g.

private T Collection T getPartition...

We might have to do something like StringgetPartition... when making the 
call though.


- Paul


On 2011-06-16 23:30:02, Sohan Jain wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/878/
 ---
 
 (Updated 2011-06-16 23:30:02)
 
 
 Review request for hive and Paul Yang.
 
 
 Summary
 ---
 
 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database. This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.
 
 
 This addresses bug HIVE-2213.
 https://issues.apache.org/jira/browse/HIVE-2213
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 1135227 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1135227 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
 1135227 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
 1135227 
   
 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
  1135227 
 
 Diff: https://reviews.apache.org/r/878/diff
 
 
 Testing
 ---
 
 Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
 
 
 Thanks,
 
 Sohan
 




Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-13 Thread Sohan Jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
---

(Updated 2011-06-13 21:11:38.325243)


Review request for hive and Paul Yang.


Changes
---

-Refactored similar functions
-Renamed getPartitionNamesPs() to listPartitionNamesPs()
-Modified get_partitions_ps() and get_partitions_ps_with_auth() for a similar 
optimization


Summary
---

If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database. This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1135227 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
1135227 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 1135227 

Diff: https://reviews.apache.org/r/878/diff


Testing
---

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-10 Thread Sohan Jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/
---

Review request for hive and Paul Yang.


Summary
---

If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database. This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.


This addresses bug HIVE-2213.
https://issues.apache.org/jira/browse/HIVE-2213


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
1134205 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
1134205 
  
trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 1134205 

Diff: https://reviews.apache.org/r/878/diff


Testing
---

Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


Thanks,

Sohan



Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()

2011-06-10 Thread Paul Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/878/#review804
---


You can do this here or in a separate JIRA, but can you update 
get_partitions_ps() using a similar technique?


trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
https://reviews.apache.org/r/878/#comment1753

Can you refactor with the above function since they are similar?



trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
https://reviews.apache.org/r/878/#comment1754

Same here



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/878/#comment1755

To be consistent with the other method, maybe call this 
listPartitionNamesPs?



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
https://reviews.apache.org/r/878/#comment1756

Combine with above


- Paul


On 2011-06-10 07:05:56, Sohan Jain wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/878/
 ---
 
 (Updated 2011-06-10 07:05:56)
 
 
 Review request for hive and Paul Yang.
 
 
 Summary
 ---
 
 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database. This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.
 
 
 This addresses bug HIVE-2213.
 https://issues.apache.org/jira/browse/HIVE-2213
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 1134205 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1134205 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
 1134205 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 
 1134205 
   
 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
  1134205 
 
 Diff: https://reviews.apache.org/r/878/diff
 
 
 Testing
 ---
 
 Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
 
 
 Thanks,
 
 Sohan