Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-17 21:22:00.028428) Review request for hive and Paul Yang. Changes --- - made getPartitionPsQueryResults() return a parameterized type to avoid lots of casting Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1136751 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review853 --- trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1862 Line exceeds 100 char limit - Paul On 2011-06-13 21:11:38, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-13 21:11:38) Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-16 23:30:02.425588) Review request for hive and Paul Yang. Changes --- -Fixed line that exceeded 100 chars Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review858 --- trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1877 Can we make this method parameterized to reduce the number of casts required? E.g. private T Collection T getPartition... We might have to do something like StringgetPartition... when making the call though. - Paul On 2011-06-16 23:30:02, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-16 23:30:02) Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-13 21:11:38.325243) Review request for hive and Paul Yang. Changes --- -Refactored similar functions -Renamed getPartitionNamesPs() to listPartitionNamesPs() -Modified get_partitions_ps() and get_partitions_ps_with_auth() for a similar optimization Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review804 --- You can do this here or in a separate JIRA, but can you update get_partitions_ps() using a similar technique? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1753 Can you refactor with the above function since they are similar? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1754 Same here trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1755 To be consistent with the other method, maybe call this listPartitionNamesPs? trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java https://reviews.apache.org/r/878/#comment1756 Combine with above - Paul On 2011-06-10 07:05:56, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-10 07:05:56) Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan