Re: Review Request: HIVE-2036: Update bitmap indexes for automatic usage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/857/ --- (Updated 2011-06-10 06:35:32.125295) Review request for hive and John Sichi. Changes --- Based on a discussion with yongqian, I re-implemented the predicate decomposition into two steps, computing the overall residual predicate from the union of all columns in the available indexes, and then computing the predicates to apply to each index individually. Additionally I have also extended the functionality to pass in partition columns to allowColumnNames and added/extended the testcases to check that partition predicates are propagated correctly. This required adding a check in IndexWhereProcessor.java that the correct FilterOperator was passed to the process(...) method (apparently a duplicate FilterOperator that does not have the entire predicate gets created). Summary --- Add support for generating index queries to support automatic usage of bitmap indexes. This required changing the interface to the IndexHandlers to support accepting queries on multiple indexes. The compact indexes were modified to use this new interface as well, although no functional changes were made to how they work. Only supports AND predicates right now, but it should be possibly to extend the BitmapQuery interface defined in this patch to easily support OR predicates as well. Currently benchmarking these changes on a test cluster. This addresses bug HIVE-2036. https://issues.apache.org/jira/browse/HIVE-2036 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java e5ee183 ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java af9d7b1 ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapInnerQuery.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapOuterQuery.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapQuery.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 56e7609 ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java d64e88b ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java 268560d ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java 0873e1a ql/src/test/queries/clientpositive/index_auto_partitioned.q 5f92f04 ql/src/test/queries/clientpositive/index_bitmap_auto.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap_auto_partitioned.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_partitioned.q.out 05cc84a ql/src/test/results/clientpositive/index_bitmap_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/857/diff Testing --- Passes unit tests, additional testcase to test automatic bitmap indexing index_bitmap_auto.q was also added to the TestCliDriver suite. Currently benchmarking changes on a test cluster. Thanks, Syed
[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage
[ https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047029#comment-13047029 ] jirapos...@reviews.apache.org commented on HIVE-2036: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/857/ --- (Updated 2011-06-10 06:35:32.125295) Review request for hive and John Sichi. Changes --- Based on a discussion with yongqian, I re-implemented the predicate decomposition into two steps, computing the overall residual predicate from the union of all columns in the available indexes, and then computing the predicates to apply to each index individually. Additionally I have also extended the functionality to pass in partition columns to allowColumnNames and added/extended the testcases to check that partition predicates are propagated correctly. This required adding a check in IndexWhereProcessor.java that the correct FilterOperator was passed to the process(...) method (apparently a duplicate FilterOperator that does not have the entire predicate gets created). Summary --- Add support for generating index queries to support automatic usage of bitmap indexes. This required changing the interface to the IndexHandlers to support accepting queries on multiple indexes. The compact indexes were modified to use this new interface as well, although no functional changes were made to how they work. Only supports AND predicates right now, but it should be possibly to extend the BitmapQuery interface defined in this patch to easily support OR predicates as well. Currently benchmarking these changes on a test cluster. This addresses bug HIVE-2036. https://issues.apache.org/jira/browse/HIVE-2036 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java e5ee183 ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java af9d7b1 ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapInnerQuery.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapOuterQuery.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapQuery.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 56e7609 ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java d64e88b ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java 268560d ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java 0873e1a ql/src/test/queries/clientpositive/index_auto_partitioned.q 5f92f04 ql/src/test/queries/clientpositive/index_bitmap_auto.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap_auto_partitioned.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_partitioned.q.out 05cc84a ql/src/test/results/clientpositive/index_bitmap_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/857/diff Testing --- Passes unit tests, additional testcase to test automatic bitmap indexing index_bitmap_auto.q was also added to the TestCliDriver suite. Currently benchmarking changes on a test cluster. Thanks, Syed Update bitmap indexes for automatic usage - Key: HIVE-2036 URL: https://issues.apache.org/jira/browse/HIVE-2036 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2036.1.patch HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap index support. The bitmap code will need to be extended after it is committed to enable automatic use of indexing. Most work will be focused in the BitmapIndexHandler, which needs to generate the re-entrant QL index query. There may also be significant work in the IndexPredicateAnalyzer to support predicates with OR's, instead of just AND's as it is currently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2213) Optimize get_partition_names_ps()
Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2213: - Attachment: HIVE-2213.1.patch Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
[jira] [Commented] (HIVE-243) ^C breaks out of running query, but not whole CLI
[ https://issues.apache.org/jira/browse/HIVE-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047043#comment-13047043 ] Hudson commented on HIVE-243: - Integrated in Hive-trunk-h0.21 #771 (See [https://builds.apache.org/job/Hive-trunk-h0.21/771/]) HIVE-2211. Fix a bug caused by HIVE-243 (Siying Dong via Ning Zhang) nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1134179 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ^C breaks out of running query, but not whole CLI - Key: HIVE-243 URL: https://issues.apache.org/jira/browse/HIVE-243 Project: Hive Issue Type: Wish Components: Query Processor Affects Versions: 0.8.0 Reporter: Adam Kramer Assignee: George Djabarov Fix For: 0.8.0 Attachments: HIVE-243.patch It would be lovely if, when I know a query is bad, I could just ^C out of it. I can do that now, but the whole CLI quits. It'd be quite nice if it took an extra ^C to break the CLI, or if there was some control character to break out of a query without breaking out of the CLI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047041#comment-13047041 ] jirapos...@reviews.apache.org commented on HIVE-2213: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2208) create a new API in Warehouse where the root directory is specified
[ https://issues.apache.org/jira/browse/HIVE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047042#comment-13047042 ] Hudson commented on HIVE-2208: -- Integrated in Hive-trunk-h0.21 #771 (See [https://builds.apache.org/job/Hive-trunk-h0.21/771/]) create a new API in Warehouse where the root directory is specified --- Key: HIVE-2208 URL: https://issues.apache.org/jira/browse/HIVE-2208 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.2208.1.patch It would be useful to create tables in multiple DFS's -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Assignee: Siying Dong Summary: reduce name node calls in hive by creating temporary directories (was: remove name node calls in hive by creating temporary directories) reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.1.patch Implemented the logic. Discovered one problem: when moving from /tmp1/_tmp_1 to /tmp2/1, we might need to check whether /tmp2 exists before moving it. This patch avoids this call by pre-create the temp directory before submitting the job. However, we cannot do that for dynamic partitioning as we don't know the directory names. So for dynamic partitioning, we have some extra costs added for DFS namenode read. So far I think this tradeoff is worthwhile. Potentially this cost can be reduced it by caching directories created. We can try that approach as a followup. reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Status: Patch Available (was: In Progress) reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: (was: HIVE-2201.1.patch) reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-2201 started by Siying Dong. reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.1.patch reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2209) Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object
[ https://issues.apache.org/jira/browse/HIVE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2209: Attachment: HIVE-2209v0.patch Patch, with tests, added. Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object - Key: HIVE-2209 URL: https://issues.apache.org/jira/browse/HIVE-2209 Project: Hive Issue Type: Improvement Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2209v0.patch Now ObjectInspectorUtils.compare throws an exception if a map is contained (recursively) within the objects being compared. Two obvious implementations are - a simple map comparer which assumes keys of the first map can be used to fetch values from the second - a 'cross-product' comparer which compares every pair of key-value pairs in the two maps, and calls a match if and only if all pairs are matched Note that it would be difficult to provide a transitive greater-than/less-than indication with maps so that is not in scope. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2188) Add get_table_objects_by_name() to Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047150#comment-13047150 ] Hudson commented on HIVE-2188: -- Integrated in Hive-trunk-h0.21 #772 (See [https://builds.apache.org/job/Hive-trunk-h0.21/772/]) HIVE-2188. Add get_table_objects_by_name() to Hive MetaStore (Sohan Jain via cws) cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1134183 Files : * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h * /hive/trunk/metastore/if/hive_metastore.thrift * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py * /hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb Add get_table_objects_by_name() to Hive MetaStore - Key: HIVE-2188 URL: https://issues.apache.org/jira/browse/HIVE-2188 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2188.1.patch, HIVE-2188.3.patch This function would get multiple tables from the hive metastore as opposed to just one at a time, saving round trip time to the metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2209) Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object
[ https://issues.apache.org/jira/browse/HIVE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2209: Status: Patch Available (was: Open) For review by He Yongqiang Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object - Key: HIVE-2209 URL: https://issues.apache.org/jira/browse/HIVE-2209 Project: Hive Issue Type: Improvement Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2209v0.patch Now ObjectInspectorUtils.compare throws an exception if a map is contained (recursively) within the objects being compared. Two obvious implementations are - a simple map comparer which assumes keys of the first map can be used to fetch values from the second - a 'cross-product' comparer which compares every pair of key-value pairs in the two maps, and calls a match if and only if all pairs are matched Note that it would be difficult to provide a transitive greater-than/less-than indication with maps so that is not in scope. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/879/ --- Review request for hive and Yongqiang He. Summary --- Patch for HIVE-2209 Diffs - serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualcomparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualcomparer.java PRE-CREATION Diff: https://reviews.apache.org/r/879/diff Testing --- Tests added Thanks, Krishna
[jira] [Updated] (HIVE-2036) Update bitmap indexes for automatic usage
[ https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed S. Albiz updated HIVE-2036: Attachment: HIVE-2036.3.patch This patch is still WIP, there are a couple of issues I know still need correcting. In particular, the index_auto_unused.q testcase fails, since I updated the partition predicates to propagate properly, there was no check to make sure that the index was built on the partition being queried (but the testcase would still pass since partition predicates weren't propagated anyway) I probably also want to refactor the logic in IndexWhereProcessor before this is ready. Update bitmap indexes for automatic usage - Key: HIVE-2036 URL: https://issues.apache.org/jira/browse/HIVE-2036 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2036.1.patch, HIVE-2036.3.patch HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap index support. The bitmap code will need to be extended after it is committed to enable automatic use of indexing. Most work will be focused in the BitmapIndexHandler, which needs to generate the re-entrant QL index query. There may also be significant work in the IndexPredicateAnalyzer to support predicates with OR's, instead of just AND's as it is currently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-branch-0.7.1-h0.21 #19
See https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/19/ -- [...truncated 27383 lines...] [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-06-10_12-47-51_236_3118745972910338142/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] 2011-06-10 12:47:54,280 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-06-10_12-47-51_236_3118745972910338142/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201106101247_1556750958.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-branch-0.7.1-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-06-10_12-47-56_568_7709338130334341560/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-06-10_12-47-56_568_7709338130334341560/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history
[jira] [Assigned] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franklin Hu reassigned HIVE-2035: - Assignee: Franklin Hu Use block-level merge for RCFile if merging intermediate results are needed --- Key: HIVE-2035 URL: https://issues.apache.org/jira/browse/HIVE-2035 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Franklin Hu Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true the intermediate data could be merged using an additional MapReduce job. This could be quite expensive if the data size is large. With HIVE-1950, merging can be done in the RCFile block level so that it bypasses the (de-)compression, (de-)serialization phases. This could improve the merge process significantly. This JIRA should handle the case where the input table is not stored in RCFile, but the destination table is (which requires the intermediate data should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franklin Hu updated HIVE-2035: -- Attachment: hive-2035.1.patch Implements block level merge of intermediate results to a table or partition stored as RCFile. Use block-level merge for RCFile if merging intermediate results are needed --- Key: HIVE-2035 URL: https://issues.apache.org/jira/browse/HIVE-2035 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Franklin Hu Attachments: hive-2035.1.patch Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true the intermediate data could be merged using an additional MapReduce job. This could be quite expensive if the data size is large. With HIVE-1950, merging can be done in the RCFile block level so that it bypasses the (de-)compression, (de-)serialization phases. This could improve the merge process significantly. This JIRA should handle the case where the input table is not stored in RCFile, but the destination table is (which requires the intermediate data should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review804 --- You can do this here or in a separate JIRA, but can you update get_partitions_ps() using a similar technique? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1753 Can you refactor with the above function since they are similar? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1754 Same here trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1755 To be consistent with the other method, maybe call this listPartitionNamesPs? trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java https://reviews.apache.org/r/878/#comment1756 Combine with above - Paul On 2011-06-10 07:05:56, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-10 07:05:56) Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047452#comment-13047452 ] jirapos...@reviews.apache.org commented on HIVE-2213: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review804 --- You can do this here or in a separate JIRA, but can you update get_partitions_ps() using a similar technique? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1753 Can you refactor with the above function since they are similar? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1754 Same here trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1755 To be consistent with the other method, maybe call this listPartitionNamesPs? trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java https://reviews.apache.org/r/878/#comment1756 Combine with above - Paul On 2011-06-10 07:05:56, Sohan Jain wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/878/ bq. --- bq. bq. (Updated 2011-06-10 07:05:56) bq. bq. bq. Review request for hive and Paul Yang. bq. bq. bq. Summary bq. --- bq. bq. If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. bq. bq. bq. This addresses bug HIVE-2213. bq. https://issues.apache.org/jira/browse/HIVE-2213 bq. bq. bq. Diffs bq. - bq. bq.trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 bq.trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 bq.trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 bq. bq. Diff: https://reviews.apache.org/r/878/diff bq. bq. bq. Testing bq. --- bq. bq. Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. bq. bq. bq. Thanks, bq. bq. Sohan bq. bq. Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2215) Add api for marking / querying set of partitions for events
Add api for marking / querying set of partitions for events --- Key: HIVE-2215 URL: https://issues.apache.org/jira/browse/HIVE-2215 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2215) Add api for marking / querying set of partitions for events
[ https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-2215: --- Attachment: hive_2215.patch Patch including generated code. Will post on RB without generated code. Incorporates feedback from John on HIVE-2147 Add api for marking / querying set of partitions for events --- Key: HIVE-2215 URL: https://issues.apache.org/jira/browse/HIVE-2215 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: hive_2215.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2215) Add api for marking / querying set of partitions for events
[ https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-2215: --- Status: Patch Available (was: Open) This patch is ready for review. Add api for marking / querying set of partitions for events --- Key: HIVE-2215 URL: https://issues.apache.org/jira/browse/HIVE-2215 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: hive_2215.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2215) Add api for marking / querying set of partitions for events
[ https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047466#comment-13047466 ] jirapos...@reviews.apache.org commented on HIVE-2215: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/ --- Review request for hive and John Sichi. Summary --- Follow-up for HIVE-2147. This addresses bug HIVE-2215. https://issues.apache.org/jira/browse/HIVE-2215 Diffs - trunk/metastore/if/hive_metastore.thrift 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MarkPartitionEvent.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java PRE-CREATION trunk/metastore/src/model/package.jdo 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionSet.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1134443 Diff: https://reviews.apache.org/r/883/diff Testing --- Added test cases for new api. Thanks, Ashutosh Add api for marking / querying set of partitions for events --- Key: HIVE-2215 URL: https://issues.apache.org/jira/browse/HIVE-2215 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: hive_2215.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2147) Add api to send / receive message to metastore
[ https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047468#comment-13047468 ] Paul Yang commented on HIVE-2147: - I agree with John's suggestion for PARTITION_EVENTS. For this event table, when will rows be dropped? Also, for when partitions are represented using a string, we've followed the convention that they are called partition names. Can we use that for MPartitionSet? Since MPartitionSet.partVals is a string, we should make it indexed, much like partitionName for the PARTITION table. Add api to send / receive message to metastore -- Key: HIVE-2147 URL: https://issues.apache.org/jira/browse/HIVE-2147 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: api-without-thrift.patch, hive_2147-2.patch This is follow-up work on HIVE-2038. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2147) Add api to send / receive message to metastore
[ https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-2147: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) As suggested, HIVE-2215 has been opened for this. Add api to send / receive message to metastore -- Key: HIVE-2147 URL: https://issues.apache.org/jira/browse/HIVE-2147 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: api-without-thrift.patch, hive_2147-2.patch This is follow-up work on HIVE-2038. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2215
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/883/ --- Review request for hive and John Sichi. Summary --- Follow-up for HIVE-2147. This addresses bug HIVE-2215. https://issues.apache.org/jira/browse/HIVE-2215 Diffs - trunk/metastore/if/hive_metastore.thrift 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134443 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/MarkPartitionEvent.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionEvent.java PRE-CREATION trunk/metastore/src/model/package.jdo 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1134443 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMarkPartitionSet.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1134443 Diff: https://reviews.apache.org/r/883/diff Testing --- Added test cases for new api. Thanks, Ashutosh
[jira] [Commented] (HIVE-2215) Add api for marking / querying set of partitions for events
[ https://issues.apache.org/jira/browse/HIVE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047506#comment-13047506 ] Ashutosh Chauhan commented on HIVE-2215: Replying to Paul's comments since I closed HIVE-2147 : bq. I agree with John's suggestion for PARTITION_EVENTS. For this event table, when will rows be dropped? This also needs to be considered. I will prefer to do it in a followup jira to keep this one manageable. bq. Also, for when partitions are represented using a string, we've followed the convention that they are called partition names. Can we use that for MPartitionSet? Yup, I can rename that. bq. Since MPartitionSet.partVals is a string, we should make it indexed, much like partitionName for the PARTITION table. In the latest patch, I have made it indexed. If you can take a look at the latest patch, that will be great. Add api for marking / querying set of partitions for events --- Key: HIVE-2215 URL: https://issues.apache.org/jira/browse/HIVE-2215 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: hive_2215.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1537) Allow users to specify LOCATION in CREATE DATABASE statement
[ https://issues.apache.org/jira/browse/HIVE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047772#comment-13047772 ] Bob Liu commented on HIVE-1537: --- Any idea as to when this feature will get implemented? Allow users to specify LOCATION in CREATE DATABASE statement Key: HIVE-1537 URL: https://issues.apache.org/jira/browse/HIVE-1537 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Carl Steinbach Assignee: Thiruvel Thirumoolan Attachments: hive-1537.metastore.part.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.2.patch fix a bug. reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira