[jira] [Updated] (HIVE-1537) Allow users to specify LOCATION in CREATE DATABASE statement
[ https://issues.apache.org/jira/browse/HIVE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HIVE-1537: -- Attachment: hive-1537.metastore.part.patch Assuming that the location issue is solved, we need some checks in the create_database handler similar to what is there in the create_table handler. This patch is an example patch. It introduces a check in HiveMetaStore.create_database_core for existence of database directory, and also checks for failure to create one. In either case, the create_database_core operation throws an exception and the DDL would fail. Allow users to specify LOCATION in CREATE DATABASE statement Key: HIVE-1537 URL: https://issues.apache.org/jira/browse/HIVE-1537 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Carl Steinbach Assignee: Thiruvel Thirumoolan Attachments: hive-1537.metastore.part.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/ --- (Updated 2011-04-15 08:08:14.640798) Review request for hive. Changes --- HIVE-1644.13.patch Summary --- Review request for HIVE-1644.12.patch This addresses bug HIVE-1644. https://issues.apache.org/jira/browse/HIVE-1644 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a21f589 conf/hive-default.xml c42197f ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 6437385 ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1f01446 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 0ae9fa2 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 937a7b3 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f0aca84 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 ql/src/test/queries/clientpositive/index_opt_where.q PRE-CREATION ql/src/test/queries/clientpositive/index_opt_where_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/index_opt_where_simple.q PRE-CREATION ql/src/test/results/clientpositive/index_opt_where.q.out PRE-CREATION ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/index_opt_where_simple.q.out PRE-CREATION Diff: https://reviews.apache.org/r/558/diff Testing --- Thanks, Russell
[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Melick updated HIVE-1644: - Attachment: HIVE-1644.13.patch I have several questions on the review board. I also fixed the minor issues, but have not yet created the new unit tests. use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020233#comment-13020233 ] jirapos...@reviews.apache.org commented on HIVE-1644: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/#review472 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment953 I would have liked to just make a copy of pctx before I called rewriteForIndex(...) for every index, and then just use whichever of those corresponded to the index rewrite we chose. However, the pctx did not seem to have an easy way to copy it. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment957 Do we need to propagate the residual predicate any further? ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment955 I'm kind of confused about how to check the actual table and not the metadata. When we call indexTable.getPartitionKeys() and part.getTable.getPartitionKeys(), that method calls getPartitionKeys() on the underlying Thrift Tables. Is there a way besides getPartitionKeys() that we should be using? ql/src/test/queries/clientpositive/index_opt_where.q https://reviews.apache.org/r/558/#comment956 I have not yet added the additional unit tests ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out https://reviews.apache.org/r/558/#comment954 I fixed the labeling for this case, but would it make sense to label our stages differently for indexing? We only relabel correctly as long as we're overwriting the highest numbered stage, since we only relabel a single task. Or, should it relabel all tasks in the whole plan? We only have easy access to the context.currentTask when we iterate through in IndexWhereProcessor (line 153) - Russell On 2011-04-15 08:08:14, Russell Melick wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/558/ bq. --- bq. bq. (Updated 2011-04-15 08:08:14) bq. bq. bq. Review request for hive. bq. bq. bq. Summary bq. --- bq. bq. Review request for HIVE-1644.12.patch bq. bq. bq. This addresses bug HIVE-1644. bq. https://issues.apache.org/jira/browse/HIVE-1644 bq. bq. bq. Diffs bq. - bq. bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a21f589 bq.conf/hive-default.xml c42197f bq.ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 6437385 bq.ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b bq.ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f bq. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1f01446 bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 0ae9fa2 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 937a7b3 bq.ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f0aca84 bq.ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 bq.ql/src/test/queries/clientpositive/index_opt_where.q PRE-CREATION bq.ql/src/test/queries/clientpositive/index_opt_where_partitioned.q PRE-CREATION bq.ql/src/test/queries/clientpositive/index_opt_where_simple.q PRE-CREATION bq.ql/src/test/results/clientpositive/index_opt_where.q.out PRE-CREATION bq.ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out PRE-CREATION bq.ql/src/test/results/clientpositive/index_opt_where_simple.q.out PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/558/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Russell bq. bq. use filter pushdown for automatically accessing indexes
[jira] [Updated] (HIVE-2114) Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema()
[ https://issues.apache.org/jira/browse/HIVE-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2114: - Description: In MetaStoreUtils.getPartSchemaFromTableSchema() the Propertie of a Partitions is first created by cloning the Properties from a Table. Then some of the properties are overwritten by the Partition level properties (Location/InputFormat/OutputFormat etc.). It also copies the properties from the SerDeInfo. If the SerDeInfo contains properties 'columns', 'column_types' and 'partition_columns' this will introduce incompatibility from the previous code path MetaStoreUtils.getSchema(). In getSchema(), the 'columns' etc are put after copying the SerDeInfo, which means we should not overwrite these 3 properties in the new code. Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema() --- Key: HIVE-2114 URL: https://issues.apache.org/jira/browse/HIVE-2114 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang In MetaStoreUtils.getPartSchemaFromTableSchema() the Propertie of a Partitions is first created by cloning the Properties from a Table. Then some of the properties are overwritten by the Partition level properties (Location/InputFormat/OutputFormat etc.). It also copies the properties from the SerDeInfo. If the SerDeInfo contains properties 'columns', 'column_types' and 'partition_columns' this will introduce incompatibility from the previous code path MetaStoreUtils.getSchema(). In getSchema(), the 'columns' etc are put after copying the SerDeInfo, which means we should not overwrite these 3 properties in the new code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020386#comment-13020386 ] Namit Jain commented on HIVE-2068: -- FetchTask: return false if number of rows found. Else, it looks good Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2068: - Status: Open (was: Patch Available) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2068: -- Attachment: HIVE-2068.6.patch Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: for HIVE-2068
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/540/ --- (Updated 2011-04-15 18:37:21.441402) Review request for hive and namit jain. Changes --- fix a small logic bug. Summary --- For HIVE-2068 This addresses bug HIVE-2068. https://issues.apache.org/jira/browse/HIVE-2068 Diffs (updated) - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1091258 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1091258 trunk/conf/hive-default.xml 1091258 trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1091258 trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1091258 trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1091258 Diff: https://reviews.apache.org/r/540/diff Testing --- added a test to test suite. Thanks, Siying
[jira] [Updated] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2068: -- Status: Patch Available (was: Open) fix the issue. I think what Namit means is that the function should always return true(no more rows). Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020390#comment-13020390 ] jirapos...@reviews.apache.org commented on HIVE-2068: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/540/ --- (Updated 2011-04-15 18:37:21.441402) Review request for hive and namit jain. Changes --- fix a small logic bug. Summary --- For HIVE-2068 This addresses bug HIVE-2068. https://issues.apache.org/jira/browse/HIVE-2068 Diffs (updated) - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1091258 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1091258 trunk/conf/hive-default.xml 1091258 trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1091258 trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1091258 trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1091258 Diff: https://reviews.apache.org/r/540/diff Testing --- added a test to test suite. Thanks, Siying Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2068: -- Status: Open (was: Patch Available) found some problem with last modified piece of codes. Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2068: -- Attachment: (was: HIVE-2068.6.patch) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2068) Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation
[ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2068: -- Status: Patch Available (was: Open) deleted the latest patch. The fetchTask return part is actually OK. Speed up query select xx,xx from xxx LIMIT xxx if no filtering or aggregation --- Key: HIVE-2068 URL: https://issues.apache.org/jira/browse/HIVE-2068 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch Currently, select xx,xx from xxx where ...(only partition conditions) LIMIT xxx will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #77
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/77/ -- [...truncated 26904 lines...] [junit] Loading data to table default.srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table default.src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table default.src_sequencefile [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_sequencefile [junit] OK [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] Loading data to table default.src_thrift [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_thrift [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] Loading data to table default.src_json [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/test/logs/negative/wrong_distinct1.q.out https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/ql/src/test/results/compiler/errors/wrong_distinct1.q.out [junit] Done query: wrong_distinct1.q [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201104151212_752127897.txt [junit] Begin query: wrong_distinct2.q [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201104151212_806432548.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11') [junit]
[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020423#comment-13020423 ] John Sichi commented on HIVE-1644: -- Responses added in review board. use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020422#comment-13020422 ] jirapos...@reviews.apache.org commented on HIVE-1644: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/#review482 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment982 A few comments here. 1) Rather than passing in the entire table scan object and letting the handler set properties on it, I think we should just have the handler pass back the necessary information (input format and intermediate file). 2) The generateIndexQuery method's parameter list is growing. For plugin interfaces, a good pattern we've been using in other places is to introduce a new context class (say HiveIndexQueryContext) with getters and setters for the information to be communicated in both directions. Then the caller instantiates one of these and passes in an instance. The plugin reads and writes to the context. On return, the caller gets the modified information out. The main benefit is that in the future, if we need to pass more information, we just add new members to the context class, and none of the existing plugin implementations break. In this case, you could also put the context objects in a map (instead of having to keep multiple maps indexQueryTasks/additionalInputs etc). ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment983 Just put it as a TODO for now; create the followup JIRA issue and reference it in the TODO. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java https://reviews.apache.org/r/558/#comment990 Look in Hive.java; there are methods like public ListPartition getPartitionsByNames(Table tbl, ListString partNames) which look up the actual partitions for a table from the metastore. You can pass in indexTable. ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out https://reviews.apache.org/r/558/#comment991 Hmm...what if we could avoid relabeling altogether? If you look in Driver.java, there's a method compile which calls TaskFactory.resetId(). This is what causes us to start back over from 0. If you add an optional parameter resetTaskIds=true, and then pass false for the Driver instance used for compiling the reentrant query, that might do it. - John On 2011-04-15 08:08:14, Russell Melick wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/558/ bq. --- bq. bq. (Updated 2011-04-15 08:08:14) bq. bq. bq. Review request for hive. bq. bq. bq. Summary bq. --- bq. bq. Review request for HIVE-1644.12.patch bq. bq. bq. This addresses bug HIVE-1644. bq. https://issues.apache.org/jira/browse/HIVE-1644 bq. bq. bq. Diffs bq. - bq. bq.common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a21f589 bq.conf/hive-default.xml c42197f bq.ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 6437385 bq.ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b bq.ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d bq.ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f bq. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1f01446 bq.ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c bq.ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 0ae9fa2 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java PRE-CREATION bq.ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 937a7b3 bq.ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f0aca84 bq.ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 bq.ql/src/test/queries/clientpositive/index_opt_where.q PRE-CREATION bq.
[jira] [Updated] (HIVE-2114) Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema()
[ https://issues.apache.org/jira/browse/HIVE-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2114: - Attachment: HIVE-2114.patch Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema() --- Key: HIVE-2114 URL: https://issues.apache.org/jira/browse/HIVE-2114 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2114.patch In MetaStoreUtils.getPartSchemaFromTableSchema() the Propertie of a Partitions is first created by cloning the Properties from a Table. Then some of the properties are overwritten by the Partition level properties (Location/InputFormat/OutputFormat etc.). It also copies the properties from the SerDeInfo. If the SerDeInfo contains properties 'columns', 'column_types' and 'partition_columns' this will introduce incompatibility from the previous code path MetaStoreUtils.getSchema(). In getSchema(), the 'columns' etc are put after copying the SerDeInfo, which means we should not overwrite these 3 properties in the new code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2114. Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/605/ --- Review request for hive. Summary --- This prevents the columns, column.types and partition_columns in SerDeInfo overwrite the Partition Properties (to be backward compatible). Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1092766 Diff: https://reviews.apache.org/r/605/diff Testing --- Unit tests are still running. Thanks, Ning
[jira] [Commented] (HIVE-2114) Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema()
[ https://issues.apache.org/jira/browse/HIVE-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020429#comment-13020429 ] Ning Zhang commented on HIVE-2114: -- https://reviews.apache.org/r/605/ Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema() --- Key: HIVE-2114 URL: https://issues.apache.org/jira/browse/HIVE-2114 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2114.patch In MetaStoreUtils.getPartSchemaFromTableSchema() the Propertie of a Partitions is first created by cloning the Properties from a Table. Then some of the properties are overwritten by the Partition level properties (Location/InputFormat/OutputFormat etc.). It also copies the properties from the SerDeInfo. If the SerDeInfo contains properties 'columns', 'column_types' and 'partition_columns' this will introduce incompatibility from the previous code path MetaStoreUtils.getSchema(). In getSchema(), the 'columns' etc are put after copying the SerDeInfo, which means we should not overwrite these 3 properties in the new code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2114) Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema()
[ https://issues.apache.org/jira/browse/HIVE-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2114: - Status: Patch Available (was: Open) Backward incompatibility introduced from HIVE-2082 in MetaStoreUtils.getPartSchemaFromTableSchema() --- Key: HIVE-2114 URL: https://issues.apache.org/jira/browse/HIVE-2114 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2114.patch In MetaStoreUtils.getPartSchemaFromTableSchema() the Propertie of a Partitions is first created by cloning the Properties from a Table. Then some of the properties are overwritten by the Partition level properties (Location/InputFormat/OutputFormat etc.). It also copies the properties from the SerDeInfo. If the SerDeInfo contains properties 'columns', 'column_types' and 'partition_columns' this will introduce incompatibility from the previous code path MetaStoreUtils.getSchema(). In getSchema(), the 'columns' etc are put after copying the SerDeInfo, which means we should not overwrite these 3 properties in the new code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2038) Metastore listener
[ https://issues.apache.org/jira/browse/HIVE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020453#comment-13020453 ] jirapos...@reviews.apache.org commented on HIVE-2038: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/606/ --- Review request for Carl Steinbach. Summary --- Addressed Carl's comments. This addresses bug HIVE-2038. https://issues.apache.org/jira/browse/HIVE-2038 Diffs - trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1092811 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1092811 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1092811 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1092811 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1092811 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1092811 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1092811 trunk/conf/hive-default.xml 1092811 trunk/metastore/if/hive_metastore.thrift 1092811 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1092811 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1092811 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1092811 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1092811 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1092811 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1092811 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1092811 Diff: https://reviews.apache.org/r/606/diff Testing --- Thanks, Ashutosh Metastore listener -- Key: HIVE-2038 URL: https://issues.apache.org/jira/browse/HIVE-2038 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: hive-2038.patch, metastore_listener.patch, metastore_listener.patch, metastore_listener.patch Provide to way to observe changes happening on Metastore -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2115) Process residual predicate during automatic index query
Process residual predicate during automatic index query --- Key: HIVE-2115 URL: https://issues.apache.org/jira/browse/HIVE-2115 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick During automatic use of indexes, we analyze the query predicate to pull out what we can use the index on. Not all of the predicate can necessarily be processed with a single index. Currently, we return the residual predicate from the IndexHandler but do not process it further. We run the full original predicate on the index table. Ideally, we would want to only run the residual predicate so we don't have to do as much processing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira