[jira] [Updated] (HIVE-2307) Schema creation scripts for PostgreSQL use bit(1) instead of boolean
[ https://issues.apache.org/jira/browse/HIVE-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HIVE-2307: Attachment: HIVE-2307.1.patch.txt Schema creation scripts for PostgreSQL use bit(1) instead of boolean Key: HIVE-2307 URL: https://issues.apache.org/jira/browse/HIVE-2307 Project: Hive Issue Type: Bug Components: Configuration, Metastore Affects Versions: 0.5.0, 0.6.0, 0.7.0, 0.7.1 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Labels: metastore, postgres Attachments: HIVE-2307.1.patch.txt The specified type for DEFERRED_REBUILD (IDXS) and IS_COMPRESSED (SDS) columns in the metastore is defined as bit(1) type which is not supported by PostgreSQL JDBC. hive create table test (id int); FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4f1adeb7 using statement INSERT INTO SDS (SD_ID,INPUT_FORMAT,OUTPUT_FORMAT,LOCATION,SERDE_ID,NUM_BUCKETS,IS_COMPRESSED) VALUES (?,?,?,?,?,?,?) failed : ERROR: column IS_COMPRESSED is of type bit but expression is of type boolean -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2307) Schema creation scripts for PostgreSQL use bit(1) instead of boolean
[ https://issues.apache.org/jira/browse/HIVE-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HIVE-2307: Status: Patch Available (was: Open) Schema creation scripts for PostgreSQL use bit(1) instead of boolean Key: HIVE-2307 URL: https://issues.apache.org/jira/browse/HIVE-2307 Project: Hive Issue Type: Bug Components: Configuration, Metastore Affects Versions: 0.7.1, 0.7.0, 0.6.0, 0.5.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Labels: metastore, postgres Attachments: HIVE-2307.1.patch.txt The specified type for DEFERRED_REBUILD (IDXS) and IS_COMPRESSED (SDS) columns in the metastore is defined as bit(1) type which is not supported by PostgreSQL JDBC. hive create table test (id int); FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4f1adeb7 using statement INSERT INTO SDS (SD_ID,INPUT_FORMAT,OUTPUT_FORMAT,LOCATION,SERDE_ID,NUM_BUCKETS,IS_COMPRESSED) VALUES (?,?,?,?,?,?,?) failed : ERROR: column IS_COMPRESSED is of type bit but expression is of type boolean -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1850) alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?)
[ https://issues.apache.org/jira/browse/HIVE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu reassigned HIVE-1850: - Assignee: Amareshwari Sriramadasu alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?) -- Key: HIVE-1850 URL: https://issues.apache.org/jira/browse/HIVE-1850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.7.0 Environment: Trunk build from a few days ago, but seen once before with older version as well. Reporter: Terje Marthinussen Assignee: Amareshwari Sriramadasu {code} create table aa ( test STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (input.regex = [^\\](.*), output.format.string = $1s); {code} This will fail. Great! {code} create table aa ( test STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (input.regex = (.*), output.format.string = $1s); {code} Works, no problem there. {code} alter table aa set serdeproperties (input.regex = [^\\](.*), output.format.string = $1s); {code} Wups... I can set that without any problems! {code} alter table aa set serdeproperties (input.regex = (.*), output.format.string = $1s); FAILED: Hive Internal Error: java.util.regex.PatternSyntaxException(Unclosed character class near index 7 [^\](.*) ^) java.util.regex.PatternSyntaxException: Unclosed character class near index 7 [^\](.*) ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.clazz(Pattern.java:2254) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:101) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:199) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:484) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:161) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:803) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerdeProps(DDLSemanticAnalyzer.java:558) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:232) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:686) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:142) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:370) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} After this, all further commands on the table fails, including drop table :) 1. The alter table command should probably check the regexp just like the create table command does 2. Even though the regexp is bad, it should be possible to do things like set the regexp again or drop the table. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-956) Add support of columnar binary serde
[ https://issues.apache.org/jira/browse/HIVE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071016#comment-13071016 ] Hudson commented on HIVE-956: - Integrated in Hive-trunk-h0.21 #849 (See [https://builds.apache.org/job/Hive-trunk-h0.21/849/]) HIVE-956: add support of columnar binary serde (Krishna Kumar via He Yongqiang) heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1150978 Files : * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDeBase.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/columnar * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java Add support of columnar binary serde Key: HIVE-956 URL: https://issues.apache.org/jira/browse/HIVE-956 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: Krishna Kumar Attachments: HIVE-956v3.patch, HIVE-956v4.patch, HIVE.956.patch.0, HIVE.956.patch.1, HIVE.956.patch.2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1850) alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?)
[ https://issues.apache.org/jira/browse/HIVE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-1850: -- Attachment: patch-1850.txt Even though, DDLTask.alterTable() does a checkValidity for the table after all the alterations, this problem is not found. Because getDeserializer() was not getting it from the Metastore with modified properties. Patch does the required change and add regression test. alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?) -- Key: HIVE-1850 URL: https://issues.apache.org/jira/browse/HIVE-1850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.7.0 Environment: Trunk build from a few days ago, but seen once before with older version as well. Reporter: Terje Marthinussen Assignee: Amareshwari Sriramadasu Attachments: patch-1850.txt {code} create table aa ( test STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (input.regex = [^\\](.*), output.format.string = $1s); {code} This will fail. Great! {code} create table aa ( test STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (input.regex = (.*), output.format.string = $1s); {code} Works, no problem there. {code} alter table aa set serdeproperties (input.regex = [^\\](.*), output.format.string = $1s); {code} Wups... I can set that without any problems! {code} alter table aa set serdeproperties (input.regex = (.*), output.format.string = $1s); FAILED: Hive Internal Error: java.util.regex.PatternSyntaxException(Unclosed character class near index 7 [^\](.*) ^) java.util.regex.PatternSyntaxException: Unclosed character class near index 7 [^\](.*) ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.clazz(Pattern.java:2254) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:101) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:199) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:484) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:161) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:803) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerdeProps(DDLSemanticAnalyzer.java:558) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:232) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:686) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:142) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:370) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} After this, all further commands on the table fails, including drop table :) 1. The alter table command should probably check the regexp just like the create table command does 2. Even though the regexp is bad, it should be possible to do things like set the regexp again or drop the table. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1850) alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?)
[ https://issues.apache.org/jira/browse/HIVE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-1850: -- Fix Version/s: 0.8.0 Status: Patch Available (was: Open) alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?) -- Key: HIVE-1850 URL: https://issues.apache.org/jira/browse/HIVE-1850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.7.0 Environment: Trunk build from a few days ago, but seen once before with older version as well. Reporter: Terje Marthinussen Assignee: Amareshwari Sriramadasu Fix For: 0.8.0 Attachments: patch-1850.txt {code} create table aa ( test STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (input.regex = [^\\](.*), output.format.string = $1s); {code} This will fail. Great! {code} create table aa ( test STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (input.regex = (.*), output.format.string = $1s); {code} Works, no problem there. {code} alter table aa set serdeproperties (input.regex = [^\\](.*), output.format.string = $1s); {code} Wups... I can set that without any problems! {code} alter table aa set serdeproperties (input.regex = (.*), output.format.string = $1s); FAILED: Hive Internal Error: java.util.regex.PatternSyntaxException(Unclosed character class near index 7 [^\](.*) ^) java.util.regex.PatternSyntaxException: Unclosed character class near index 7 [^\](.*) ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.clazz(Pattern.java:2254) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:101) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:199) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:484) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:161) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:803) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerdeProps(DDLSemanticAnalyzer.java:558) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:232) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:686) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:142) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:370) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} After this, all further commands on the table fails, including drop table :) 1. The alter table command should probably check the regexp just like the create table command does 2. Even though the regexp is bad, it should be possible to do things like set the regexp again or drop the table. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071130#comment-13071130 ] jirapos...@reviews.apache.org commented on HIVE-1694: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/ --- Review request for hive and John Sichi. Summary --- This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. This addresses bug HIVE-1694. https://issues.apache.org/jira/browse/HIVE-1694 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1194/diff Testing --- Thanks, Prajakta Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prajakta Kalmegh updated HIVE-1694: --- Attachment: HIVE-1694.4.patch Review Changes done after last review. Added new functionality (See post for more details) Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071135#comment-13071135 ] Prajakta Kalmegh commented on HIVE-1694: Hi John Please find attached the latest patch (HIVE-1694.4.patch): The patch contains: 1. Support for multiple aggregates in index creation using the AggregateIndexHandler. The column names for the index schema are constructed dynamically depending on the aggregates. For 'aggregateFunction(columnName)', the column name in index will be `_aggregateFunction_of_columnName`. For example, for count(l_shipdate), the column name will be `_count_of_l_shipdate)`. For 'count(*)' function, the column name will be `_count_of_all`. 2. Fixed the bug for duplicates in Group-by removal cases. We are not removing group-by in any case now. This has made the logic for query rewrites quite simpler than before. We removed 4 classes (RewriteIndexSubqueryCtx.java, RewriteIndexSubqueryProcFactory.java, RewriteRemoveGroupbyCtx.java, RewriteRemoveGroupbyProcFactory.java) from the previous patch and added two new simpler classes instead (RewriteQueryUsingAggregateIndex.java, RewriteQueryUsingAggregateIndexCtx.java). 3. Added a new query (with 'UNION ALL') in the same ql_rewrite_gbtoidx.q file to demonstrate your requirement in last post. Please note that the query is not a valid real-work use case scenario; but still suffices our purpose to see that one branch rewrite does not corrupt the other branch. 4. Rewrite Optimization now happens after the PredicatePushdown, PartitionPruner and PartitionConditionRemover. This patch does not contain: 1. Optimization for cases with mulitple aggregates in selection 2. Optimization for any other aggregate function apart from count 3. Optimization for queries involving multiple tables (even if they are in a different branch). Since we are not optimizing for case of joins, the constraint also filters out queries which have different tables in union queries. 4. Optimizations for index with multiple columns in its key Here is the review board link for the patch https://reviews.apache.org/r/1194/. Please let me know if you have any questions. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-1694 started by Prajakta Kalmegh. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Prajakta Kalmegh Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/ --- Review request for hive and John Sichi. Summary --- This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. This addresses bug HIVE-1694. https://issues.apache.org/jira/browse/HIVE-1694 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1194/diff Testing --- Thanks, Prajakta
[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071291#comment-13071291 ] Paul Yang commented on HIVE-2226: - Committed. Thanks Sohan! Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2226: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franklin Hu updated HIVE-2272: -- Attachment: hive-2272.6.patch rebase add TIMESTAMP data type --- Key: HIVE-2272 URL: https://issues.apache.org/jira/browse/HIVE-2272 Project: Hive Issue Type: New Feature Reporter: Franklin Hu Assignee: Franklin Hu Attachments: hive-2272.1.patch, hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, hive-2272.5.patch, hive-2272.6.patch Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision using both LazyBinary and LazySimple SerDes. For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp parsable strings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2286: ClassCastException when building index with security.authorization turned on
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1137/#review1188 --- ql/src/java/org/apache/hadoop/hive/ql/Driver.java https://reviews.apache.org/r/1137/#comment2597 java.util.Stack is deprecated since it adds unnecessary synchronization. We don't have a replacement yet (HIVE-1626) so we've just been using ArrayList. Also, instead of typecasting to/from Object, use a static inner class for holding the record of state variables. - John On 2011-07-25 23:03:22, Syed Albiz wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1137/ --- (Updated 2011-07-25 23:03:22) Review request for hive, John Sichi and Ning Zhang. Summary --- Save the original HiveOperation/commandType when we generate the index builder task and restore it after we're done generating the task so that the authorization checks make the right decision when deciding what to do. This addresses bug HIVE-2286. https://issues.apache.org/jira/browse/HIVE-2286 Diffs - ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION ql/src/test/results/clientnegative/index_compact_entry_limit.q.out fcb2673 ql/src/test/results/clientnegative/index_compact_size_limit.q.out fcb2673 ql/src/test/results/clientpositive/index_auth.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto.q.out 8d65f98 ql/src/test/results/clientpositive/index_auto_file_format.q.out 194b35e ql/src/test/results/clientpositive/index_auto_multiple.q.out 6b81fc3 ql/src/test/results/clientpositive/index_auto_partitioned.q.out b0635db ql/src/test/results/clientpositive/index_auto_unused.q.out 3631bbc ql/src/test/results/clientpositive/index_bitmap.q.out 8f41ce3 ql/src/test/results/clientpositive/index_bitmap1.q.out 9f638f5 ql/src/test/results/clientpositive/index_bitmap2.q.out e901477 ql/src/test/results/clientpositive/index_bitmap3.q.out 116c973 ql/src/test/results/clientpositive/index_bitmap_auto.q.out cc9d91e ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 9003eb4 ql/src/test/results/clientpositive/index_bitmap_rc.q.out 9bd3c98 ql/src/test/results/clientpositive/index_compact.q.out c339ec9 ql/src/test/results/clientpositive/index_compact_1.q.out 34ba3ca ql/src/test/results/clientpositive/index_compact_2.q.out e8ce238 ql/src/test/results/clientpositive/index_compact_3.q.out d39556d ql/src/test/results/clientpositive/index_creation.q.out 532f07e Diff: https://reviews.apache.org/r/1137/diff Testing --- Added new testcase to TestCliDriver: index_auth.q Thanks, Syed
[jira] [Commented] (HIVE-2286) ClassCastException when building index with security.authorization turned on
[ https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071315#comment-13071315 ] jirapos...@reviews.apache.org commented on HIVE-2286: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1137/#review1188 --- ql/src/java/org/apache/hadoop/hive/ql/Driver.java https://reviews.apache.org/r/1137/#comment2597 java.util.Stack is deprecated since it adds unnecessary synchronization. We don't have a replacement yet (HIVE-1626) so we've just been using ArrayList. Also, instead of typecasting to/from Object, use a static inner class for holding the record of state variables. - John On 2011-07-25 23:03:22, Syed Albiz wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1137/ bq. --- bq. bq. (Updated 2011-07-25 23:03:22) bq. bq. bq. Review request for hive, John Sichi and Ning Zhang. bq. bq. bq. Summary bq. --- bq. bq. Save the original HiveOperation/commandType when we generate the index builder task and restore it after we're done generating the task so that the authorization checks make the right decision when deciding what to do. bq. bq. bq. This addresses bug HIVE-2286. bq. https://issues.apache.org/jira/browse/HIVE-2286 bq. bq. bq. Diffs bq. - bq. bq.ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe bq.ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION bq.ql/src/test/results/clientnegative/index_compact_entry_limit.q.out fcb2673 bq.ql/src/test/results/clientnegative/index_compact_size_limit.q.out fcb2673 bq.ql/src/test/results/clientpositive/index_auth.q.out PRE-CREATION bq.ql/src/test/results/clientpositive/index_auto.q.out 8d65f98 bq.ql/src/test/results/clientpositive/index_auto_file_format.q.out 194b35e bq.ql/src/test/results/clientpositive/index_auto_multiple.q.out 6b81fc3 bq.ql/src/test/results/clientpositive/index_auto_partitioned.q.out b0635db bq.ql/src/test/results/clientpositive/index_auto_unused.q.out 3631bbc bq.ql/src/test/results/clientpositive/index_bitmap.q.out 8f41ce3 bq.ql/src/test/results/clientpositive/index_bitmap1.q.out 9f638f5 bq.ql/src/test/results/clientpositive/index_bitmap2.q.out e901477 bq.ql/src/test/results/clientpositive/index_bitmap3.q.out 116c973 bq.ql/src/test/results/clientpositive/index_bitmap_auto.q.out cc9d91e bq.ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 9003eb4 bq.ql/src/test/results/clientpositive/index_bitmap_rc.q.out 9bd3c98 bq.ql/src/test/results/clientpositive/index_compact.q.out c339ec9 bq.ql/src/test/results/clientpositive/index_compact_1.q.out 34ba3ca bq.ql/src/test/results/clientpositive/index_compact_2.q.out e8ce238 bq.ql/src/test/results/clientpositive/index_compact_3.q.out d39556d bq.ql/src/test/results/clientpositive/index_creation.q.out 532f07e bq. bq. Diff: https://reviews.apache.org/r/1137/diff bq. bq. bq. Testing bq. --- bq. bq. Added new testcase to TestCliDriver: index_auth.q bq. bq. bq. Thanks, bq. bq. Syed bq. bq. ClassCastException when building index with security.authorization turned on Key: HIVE-2286 URL: https://issues.apache.org/jira/browse/HIVE-2286 Project: Hive Issue Type: Bug Reporter: Syed S. Albiz Assignee: Syed S. Albiz Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch When trying to build an index with authorization checks turned on, hive issues the following ClassCastException: org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to org.apache.hadoop.hive.ql.parse.SemanticAnalyzer at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392) at
[jira] [Commented] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071318#comment-13071318 ] Vaibhav Aggarwal commented on HIVE-2020: I propose to use -d, --define to define Hive variables. Amazon ElasticMapreduce is already using this notation for hive variables and variable substitution. This approach would also clearly separate use of -hiveconf from -d or --define which would be used to purely set hive variables. This would also maintain consistency for Hive users. Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2305) UNION ALL on different types throws runtime exception
[ https://issues.apache.org/jira/browse/HIVE-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franklin Hu updated HIVE-2305: -- Attachment: hive-2305.2.patch fix upstream input file change propagation UNION ALL on different types throws runtime exception - Key: HIVE-2305 URL: https://issues.apache.org/jira/browse/HIVE-2305 Project: Hive Issue Type: Bug Affects Versions: 0.7.1 Reporter: Franklin Hu Assignee: Franklin Hu Attachments: hive-2305.1.patch, hive-2305.2.patch Ex: SELECT * (SELECT 123 FROM ... UNION ALL SELECT '123' FROM ..) t; Unioning columns of different types currently throws runtime exceptions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1143) CREATE VIEW followup: updatable views
[ https://issues.apache.org/jira/browse/HIVE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1143: Assignee: Charles Chen (was: Carl Steinbach) CREATE VIEW followup: updatable views -- Key: HIVE-1143 URL: https://issues.apache.org/jira/browse/HIVE-1143 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen For HIVE-972, we only implemented read-only views. Updatable views are difficult in general, but for simple cases where views are being used to impose a rename layer on existing tables/columns, update support would be high value (for consistent read/write access) and not a lot of work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1989) recognize transitivity of predicates on join keys
[ https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1989: Assignee: Charles Chen recognize transitivity of predicates on join keys - Key: HIVE-1989 URL: https://issues.apache.org/jira/browse/HIVE-1989 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Given {noformat} set hive.mapred.mode=strict; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); select count(*) from invites join invites2 on invites.ds=invites2.ds and invites.ds='2011-01-01'; {noformat} currently an error occurs: {noformat} Error in semantic analysis: No Partition Predicate Found for Alias invites2 Table invites2 {noformat} The optimizer should be able to infer a predicate on invites2 via transitivity. The current lack places a burden on the user to add a redundant predicate, and makes impossible (at least in strict mode) join views where both underlying tables are partitioned (the join select list has to pick one of the tables arbitrarily). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2123) CommandNeedRetryException needs release locks
[ https://issues.apache.org/jira/browse/HIVE-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071327#comment-13071327 ] John Sichi commented on HIVE-2123: -- This one has been sitting in Patch Available queue for a while...anything holding it up? CommandNeedRetryException needs release locks - Key: HIVE-2123 URL: https://issues.apache.org/jira/browse/HIVE-2123 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2123.1.patch, HIVE-2123.2.patch, HIVE-2123.3.patch, HIVE-2123.4.patch now when CommandNeedRetryException is thrown, locks are not released. Not sure whether it will cause problem, since the same locks will be acquired when retrying it. It is anyway something we need to fix. Also we can do some little code cleaning up to make future mistakes less likely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions
[ https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071328#comment-13071328 ] John Sichi commented on HIVE-2242: -- This one has been sitting in Patch Available queue for a while...anything holding it up? DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions - Key: HIVE-2242 URL: https://issues.apache.org/jira/browse/HIVE-2242 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2242.1.patch Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string);}} {{alter table test_table add partition (p1=1, p2=1);}} {{alter table test_table add partition (p1=1, p2=2);}} {{alter table test_table add partition (p1=2, p2=2);}} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071330#comment-13071330 ] John Sichi commented on HIVE-2065: -- This one has been sitting in Patch Available queue for a while...are there issues that still need to be resolved? RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2272: add TIMESTAMP data type
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1135/ --- (Updated 2011-07-26 21:11:35.218104) Review request for hive. Changes --- Rebase Summary --- Adds TIMESTAMP type to serde2 with both string (LazySimple) and binary (LazyBinary) serialization. Supports SQL style jdbc timestamps of the format with nanosecond precision -MM-DD HH:MM:SS[.fff...] This addresses bug HIVE-2272. https://issues.apache.org/jira/browse/HIVE-2272 Diffs (updated) - trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDate.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateAdd.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfMonth.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMonth.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFWeekOfYear.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFYear.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovarianceSample.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStdSample.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java PRE-CREATION trunk/ql/src/test/queries/clientnegative/invalid_t_create3.q 1151189 trunk/ql/src/test/queries/clientpositive/timestamp_1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_2.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_comparison.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_udf.q PRE-CREATION trunk/ql/src/test/results/clientnegative/invalid_create_tbl1.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_alter1.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_alter2.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_create1.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_create2.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_transform.q.out 1151189 trunk/ql/src/test/results/clientnegative/wrong_column_type.q.out 1151189 trunk/ql/src/test/results/clientpositive/show_functions.q.out 1151189 trunk/ql/src/test/results/clientpositive/timestamp_1.q.out PRE-CREATION
[jira] [Commented] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071368#comment-13071368 ] jirapos...@reviews.apache.org commented on HIVE-2272: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1135/ --- (Updated 2011-07-26 21:11:35.218104) Review request for hive. Changes --- Rebase Summary --- Adds TIMESTAMP type to serde2 with both string (LazySimple) and binary (LazyBinary) serialization. Supports SQL style jdbc timestamps of the format with nanosecond precision -MM-DD HH:MM:SS[.fff...] This addresses bug HIVE-2272. https://issues.apache.org/jira/browse/HIVE-2272 Diffs (updated) - trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDate.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateAdd.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfMonth.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMonth.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFWeekOfYear.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFYear.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovarianceSample.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStdSample.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java 1151189 trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java PRE-CREATION trunk/ql/src/test/queries/clientnegative/invalid_t_create3.q 1151189 trunk/ql/src/test/queries/clientpositive/timestamp_1.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_2.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_comparison.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/timestamp_udf.q PRE-CREATION trunk/ql/src/test/results/clientnegative/invalid_create_tbl1.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_alter1.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_alter2.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_create1.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_create2.q.out 1151189 trunk/ql/src/test/results/clientnegative/invalid_t_transform.q.out
[jira] [Created] (HIVE-2308) Throw an error if user specifies unsupported FS in LOCATION clause of CREATE TABLE
Throw an error if user specifies unsupported FS in LOCATION clause of CREATE TABLE -- Key: HIVE-2308 URL: https://issues.apache.org/jira/browse/HIVE-2308 Project: Hive Issue Type: Bug Components: SQL Reporter: Carl Steinbach -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2309: Attachment: HIVE-2309.1.patch Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071409#comment-13071409 ] Siying Dong commented on HIVE-2309: --- can we limit number of digits for the attempt ID? Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2231) Column aliases
[ https://issues.apache.org/jira/browse/HIVE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071412#comment-13071412 ] Adam Kramer commented on HIVE-2231: --- The use case here is basically providing backwards compatibility. So for many users of a table, and many new users of a table, they are using the same table and want to refer to it as such; it is the canonical table. But sometimes the table was originally named with crummy names, and it'd be better and cleaner to document and train new people on the appropriate names. Views eat up the namespace and provide a level of misdirection that is not always desirable, but here are the two biggest limitations of views: * SELECT * is not fast. I can't SELECT * on a view and get data immediately in the same way that I would upon writing the same query. This is true even when the schema are exactly the same. * Partitions are not see-through. I can't use show partitions on a view or write any automated system based on the view to identify when new partitions land, which forces reference to the original table, and then all is lost. Column aliases -- Key: HIVE-2231 URL: https://issues.apache.org/jira/browse/HIVE-2231 Project: Hive Issue Type: Wish Components: Query Processor Reporter: Adam Kramer Priority: Trivial It would be nice in several cases to be able to alias column names. Say someone in your company CREATEd a TABLE called important_but_named_poorly (alvin BIGINT, theodore BIGINT, simon STRING) PARTITIONED BY (dave STRING), that indexes the relationship between an actor (alvin), a target (theodore), and the interaction between them (simon), partitioned based on the date string (dave). Renaming the columns would break a million pipelines that are important but ownerless. It would be awesome to define an aliasing system as such: ALTER TABLE important_but_named_poorly REPLACE COLUMNS (actor BIGINT AKA alvin, target BIGINT AKA theodore, ixn STRING AKA simon) PARTITIONED BY (ds STRING AKA dave); ...which would mean that any user could, e.g., use the term dave to refer to ds if they really wanted to. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1955) Support non-constant expressions for array indexes.
[ https://issues.apache.org/jira/browse/HIVE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-1955: -- Description: FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array Indexes not Supported dut ...just wrote my own UDF to do this, and it is trivial. We should support this natively. Let foo have these rows: arr i [1,2,3] 1 [3,4,5] 2 [5,4,3] 2 [0,0,1] 0 Then, SELECT arr[i] FROM foo should return: 2 5 3 1 Similarly, for the same table, SELECT 3 IN arr FROM foo should return: true true true false ...these use cases are needless limitations of functionality. We shouldn't need UDFs to accomplish these goals. was: FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array Indexes not Supported dut ...just wrote my own UDF to do this, and it is trivial. We should support this natively. Support non-constant expressions for array indexes. --- Key: HIVE-1955 URL: https://issues.apache.org/jira/browse/HIVE-1955 Project: Hive Issue Type: Improvement Reporter: Adam Kramer FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array Indexes not Supported dut ...just wrote my own UDF to do this, and it is trivial. We should support this natively. Let foo have these rows: arr i [1,2,3] 1 [3,4,5] 2 [5,4,3] 2 [0,0,1] 0 Then, SELECT arr[i] FROM foo should return: 2 5 3 1 Similarly, for the same table, SELECT 3 IN arr FROM foo should return: true true true false ...these use cases are needless limitations of functionality. We shouldn't need UDFs to accomplish these goals. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-1466: -- Description: NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. was: I just updated the Hive wiki to clarify what some would consider an oddity: When NULL values are exported to a script via TRANSFORM, they are converted to the string \N, and then when the script's output is read, any cell that contains only \N is treated as a NULL value. I believe that there are very VERY few reasons why anyone would need cells that contain only a backslash and then a capital N to be distinguished from NULL cells, but for complete generality, we should allow this. The way to do that is probably by adding a specification in the ROW FORMAT for a table that would allow any string to be treated as a NULL if it is the only string in a cell. Some may prefer the empty string, others the word NULL in caps, etc. I vote for keeping \N as the default because I am used to it, but also for allowing this to be customized. Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: Improvement Reporter: Adam Kramer NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2311) TRANSFORM statements should come with their own ROW FORMATs.
TRANSFORM statements should come with their own ROW FORMATs. Key: HIVE-2311 URL: https://issues.apache.org/jira/browse/HIVE-2311 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Adam Kramer Sometimes Hive tables contain tabs and/or other characters that could easily be misinterpreted by a transformer as a delimiter. This can break many TRANSFORM queries. The solution is to have a ROW FORMAT semantics that can be attached to an individual TRANSFORM instance. It would have the same semantics as table creation, but during serialization it would ensure that any formal delimiter characters that did not indicate an actual break between columns would be escaped. At the very least, it is a bug that TRANSFORM statement deserialization does not backslash out literal tabs in the current implementation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2311) TRANSFORM statements should come with their own ROW FORMATs.
[ https://issues.apache.org/jira/browse/HIVE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-2311: -- Priority: Minor (was: Major) Issue Type: Bug (was: Improvement) TRANSFORM statements should come with their own ROW FORMATs. Key: HIVE-2311 URL: https://issues.apache.org/jira/browse/HIVE-2311 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Adam Kramer Priority: Minor Sometimes Hive tables contain tabs and/or other characters that could easily be misinterpreted by a transformer as a delimiter. This can break many TRANSFORM queries. The solution is to have a ROW FORMAT semantics that can be attached to an individual TRANSFORM instance. It would have the same semantics as table creation, but during serialization it would ensure that any formal delimiter characters that did not indicate an actual break between columns would be escaped. At the very least, it is a bug that TRANSFORM statement deserialization does not backslash out literal tabs in the current implementation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2309: Attachment: HIVE-2309.2.patch Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2312) Make CLI variables available to UDFs
Make CLI variables available to UDFs Key: HIVE-2312 URL: https://issues.apache.org/jira/browse/HIVE-2312 Project: Hive Issue Type: Improvement Components: CLI, Clients, UDF Reporter: Adam Kramer Straightforward use case: My UDFs should be able to condition on whether hive.mapred.mode=strict or nonstrict. But these things could also be useful for certain optimizations. For example, a UDAF knowing that there is only one reduce phase could avoid a lot of pushing data around unnecessarily. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071420#comment-13071420 ] Siying Dong commented on HIVE-2309: --- +1, will commit after tests pass Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-89) avg() min() max() will get error message
[ https://issues.apache.org/jira/browse/HIVE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-89: --- Fix Version/s: 0.3.0 avg() min() max() will get error message Key: HIVE-89 URL: https://issues.apache.org/jira/browse/HIVE-89 Project: Hive Issue Type: Bug Components: Query Processor Environment: hadoop 0.17.2.1 hive 0.17.0 Reporter: YihueyChyi Assignee: Zheng Shao Fix For: 0.3.0 When I run select min() , max() or avg() ,I will get error message Test table : data rows: 15835023 error message: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver Hadoop web:50030 message From reduce process java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:173) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:243) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:168) ... 2 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:210) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:297) at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:240) ... 3 more Caused by: java.lang.NumberFormatException: For input string: 2004-12-22 at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224) at java.lang.Double.parseDouble(Double.java:510) at org.apache.hadoop.hive.ql.udf.UDAFAvg.aggregate(UDAFAvg.java:42) ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1251) TRANSFORM should allow piping or allow cross-subquery assumptions.
[ https://issues.apache.org/jira/browse/HIVE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-1251: -- Description: Many traditional transforms can be accomplished via simple unix commands chained together. For example, the sort phase is an instance of cut -f 1 | sort. However, the TRANSFORM command in Hive doesn't allow for unix-style piping to occur. One classic case where I wish there was piping is when I want to stack a column into several rows: SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python reducer.py' AS key, value ...in this case, stacker.py would produce output of this form: key col0 key col1 key col2 ...and then the reducer would reduce the above down to one item per key. In this case, the current workaround is this: SELECT TRANSFORM(a.key, a.col) USING 'python reducer.py' AS key, value FROM (SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py' AS key, col FROM table) ...the problem here is that for the above to work (and it should, indeed, work in a map-only MR task), I must assume that the data output from one subquery will be passed in EXACTLY THE SAME FORMAT to the outer query--i.e., I must assume that Hive will not cut a map or reduce phase in between, or fan out data from the inner query into different mappers in the outer query. As a user, *I should not be allowed to assume* that data coming out of a subquery goes into the nodes for a superquery in the same order...ESPECIALLY in the map phase. was: Many traditional transforms can be accomplished via simple unix commands chained together. For example, the sort phase is an instance of cut -f 1 | sort. However, the TRANSFORM command in Hive doesn't allow for unix-style piping to occur. One classic case where I wish there was piping is when I want to stack a column into several rows: SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python reducer.py' AS key, value ...in this case, stacker.py would produce output of this form: key col0 key col1 key col2 ...and then the reducer would reduce the above down to one item per key. In this case, the current workaround is this: SELECT TRANSFORM(a.key, a.col) USING 'python reducer.py' AS key, value FROM (SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py' AS key, col FROM table) ...the problem here is that as a user, *I should not be allowed to assume* that the output from the inner query will be passed DIRECTLY to the outer query (i.e., the outer query should not assume that it gets the inner query's output on the same box and in the same order). I know as a programmer that this works fine as a pipe, but when writing Hive code I always wonder--what if Hive decides to run the inner query in a reduce step, and the outer query in a subsequent map step? Broadly, my understanding is that the goal of Hive is to abstract the mapreduce process away from users. To this end, we have syntax (CLUSTER BY) that allows users to assume that a reduce task will occur (but see also https://issues.apache.org/jira/browse/HIVE-835 ), but there is no formal way to force or syntactically assume that the data will NOT be copied or sorted or transformed. I argue that the only case where this would be necessary or desirable would be in the instance of a pipe within a transform...ergo a desire for | to work as expected. An alternative would be for the HQL language definition to explicitly state all conditions that would cause a task boundary to be crossed (so I can make the strong assumption that if none of those conditions obtains, my query will be supported in the future)...but that seems potentially restrictive as the language and Hadoop evolves. Summary: TRANSFORM should allow piping or allow cross-subquery assumptions. (was: TRANSFORM should allow pipes in some form) TRANSFORM should allow piping or allow cross-subquery assumptions. -- Key: HIVE-1251 URL: https://issues.apache.org/jira/browse/HIVE-1251 Project: Hive Issue Type: Improvement Reporter: Adam Kramer Many traditional transforms can be accomplished via simple unix commands chained together. For example, the sort phase is an instance of cut -f 1 | sort. However, the TRANSFORM command in Hive doesn't allow for unix-style piping to occur. One classic case where I wish there was piping is when I want to stack a column into several rows: SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python reducer.py' AS key, value ...in this case, stacker.py would produce output of this form: key col0 key col1 key col2 ...and then the reducer would reduce the above down to one item per key. In this case, the current workaround is this: SELECT TRANSFORM(a.key, a.col) USING
[jira] [Updated] (HIVE-10) [Hive] filter is executed after the join
[ https://issues.apache.org/jira/browse/HIVE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-10: --- Fix Version/s: 0.3.0 [Hive] filter is executed after the join Key: HIVE-10 URL: https://issues.apache.org/jira/browse/HIVE-10 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.3.0 Filter is not pushed above the join in Hive currently. This can be pretty expensive if the filter is highly selective. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-39) Hive: we should be able to specify a column without a table/alias name
[ https://issues.apache.org/jira/browse/HIVE-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-39: --- Fix Version/s: 0.3.0 Hive: we should be able to specify a column without a table/alias name -- Key: HIVE-39 URL: https://issues.apache.org/jira/browse/HIVE-39 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Zheng Shao Assignee: Ashish Thusoo Fix For: 0.3.0 SELECT field1, field2 from table1 should work, just as SELECT table1.field1, table1.field2 from table1 For join, the situation will be a bit more complicated. If the 2 join operands have columns of the same name, then we should output an ambiguity error. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-58) [hive] join condition does not allow a simple filter
[ https://issues.apache.org/jira/browse/HIVE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-58: --- Fix Version/s: 0.3.0 [hive] join condition does not allow a simple filter Key: HIVE-58 URL: https://issues.apache.org/jira/browse/HIVE-58 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.3.0 In the join condition, a simple filter condition cannot be specified. For example, select from A join B ON (A.a = B.b and A.x = 10); is not supported. This can be very useful specially in case of outer joins. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-26) [Hive] uppercase alias with a join not working
[ https://issues.apache.org/jira/browse/HIVE-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-26: --- Fix Version/s: 0.3.0 [Hive] uppercase alias with a join not working -- Key: HIVE-26 URL: https://issues.apache.org/jira/browse/HIVE-26 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.3.0 EXPLAIN FROM (SELECT src.* FROM src) x JOIN (SELECT src.* FROM src) Y ON (x.key = Y.key) SELECT Y.*; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-836) Add syntax to force a new mapreduce job / transform subquery in mapper
[ https://issues.apache.org/jira/browse/HIVE-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-836: - Description: Hive currently does a lot of awesome work to figure out when my transformers should be used in the mapper and when they should be used in the reducer. However, sometimes I have a different plan. For example, consider this: {code:title=foo.sql} SELECT TRANSFORM(a.val1, a.val2) USING './niftyscript' AS part1, part2, part3 FROM ( SELECT b.val AS val1, c.val AS val2 FROM tblb b JOIN tblc c on (b.key=c.key) ) a {code} ...now, assume that the join step is very easy and 'niftyscript' is really processor intensive. The ideal format for this is a MR task with few mappers and few reducers, and then a second MR task with lots of mappers. Currently, there is no way to even require the outer TRANSFORM statement occur in a separate map phase. Implementing a hint such as /* +MAP */, akin to /* +MAPJOIN(x) */, would be awesome. Current workaround is to dump everything to a temporary table and then start over, but that is not an easy to scale--the subquery structure effectively (and easily) locks the mid-points so no other job can touch the table. was: Hive currently does a lot of awesome work to figure out when my transformers should be used in the mapper and when they should be used in the reducer. However, sometimes I have a different plan. For example, consider this: SELECT TRANSFORM(a.val1, a.val2) USING './niftyscript' AS part1, part2, part3 FROM ( SELECT b.val AS val1, c.val AS val2 FROM tblb b JOIN tblc c on (b.key=c.key) ) a ...in this syntax b and c will be joined (in the reducer, of course), and then the rows that pass the join clause will be passed to niftyscript _in the reducer._ However, when niftyscript is high-computation and there is a lot of data coming out of the join but very few reducers, there's a huge hold-up. It would be awesome if I could somehow force a new mapreduce step after the subquery, so that ./niftyscript is run in the mappers rather than the prior step's reducers. Current workaround is to dump everything to a temporary table and then start over, but that is not an easy to scale--the subquery structure effectively (and easily) locks the mid-points so no other job can touch the table. SUGGESTED FIX: Either cause MAP and REDUCE to force map/reduce steps (c.f. https://issues.apache.org/jira/browse/HIVE-835 ), or add a query element to specify that the job ends here. For example, in the above query, FROM a SELF-CONTAINED or PRECOMPUTE a or START JOB AFTER a or something like that. Add syntax to force a new mapreduce job / transform subquery in mapper -- Key: HIVE-836 URL: https://issues.apache.org/jira/browse/HIVE-836 Project: Hive Issue Type: Wish Reporter: Adam Kramer Hive currently does a lot of awesome work to figure out when my transformers should be used in the mapper and when they should be used in the reducer. However, sometimes I have a different plan. For example, consider this: {code:title=foo.sql} SELECT TRANSFORM(a.val1, a.val2) USING './niftyscript' AS part1, part2, part3 FROM ( SELECT b.val AS val1, c.val AS val2 FROM tblb b JOIN tblc c on (b.key=c.key) ) a {code} ...now, assume that the join step is very easy and 'niftyscript' is really processor intensive. The ideal format for this is a MR task with few mappers and few reducers, and then a second MR task with lots of mappers. Currently, there is no way to even require the outer TRANSFORM statement occur in a separate map phase. Implementing a hint such as /* +MAP */, akin to /* +MAPJOIN(x) */, would be awesome. Current workaround is to dump everything to a temporary table and then start over, but that is not an easy to scale--the subquery structure effectively (and easily) locks the mid-points so no other job can touch the table. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-141) drop table partition behaving oddly - does not create subdirectories
[ https://issues.apache.org/jira/browse/HIVE-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-141: Fix Version/s: 0.3.0 drop table partition behaving oddly - does not create subdirectories Key: HIVE-141 URL: https://issues.apache.org/jira/browse/HIVE-141 Project: Hive Issue Type: Bug Components: Metastore Reporter: Hao Liu Assignee: Prasad Chakka Priority: Critical Fix For: 0.3.0 Original Estimate: 4h Remaining Estimate: 4h for example, I have a table, which has two partitions: tmp_table_name/dt=2008-11-01 tmp_table_name/dt=2008-11-02 When we use hive metastore to drop the first partition (as root), I expect the data file will be moved to user/root/.Trash/081103/tmp_table_name/dt=2008-11-01 by default. However, directory tmp_table_name was not created, the data was moved to user/root/.Trash/081103/dt=2008-11-01, which makes data recovery a very difficult task. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-66) Insert into a dynamic serde table from a MetadataTypedColumnSetSerDe
[ https://issues.apache.org/jira/browse/HIVE-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-66: --- Fix Version/s: 0.3.0 Insert into a dynamic serde table from a MetadataTypedColumnSetSerDe Key: HIVE-66 URL: https://issues.apache.org/jira/browse/HIVE-66 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Priority: Critical Fix For: 0.3.0 Fails with column mismatch error. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-106) Join operation fails for some queries
[ https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-106: Fix Version/s: 0.8.0 Join operation fails for some queries - Key: HIVE-106 URL: https://issues.apache.org/jira/browse/HIVE-106 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Josh Ferguson Assignee: Namit Jain Priority: Critical Fix For: 0.8.0 The Tables Are CREATE TABLE activities (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null), FieldSchema(name:actee_id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) CREATE TABLE users (id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) A working query is SELECT activities.* FROM activities WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; A non working query is SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; The Exception Is java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index expression on string at org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430) at
[jira] [Updated] (HIVE-145) Hive wiki provides incorrect download and setup instructions
[ https://issues.apache.org/jira/browse/HIVE-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-145: Fix Version/s: 0.3.0 Hive wiki provides incorrect download and setup instructions Key: HIVE-145 URL: https://issues.apache.org/jira/browse/HIVE-145 Project: Hive Issue Type: Task Components: Documentation Reporter: Aaron Kimball Assignee: Raghotham Murthy Fix For: 0.3.0 The Getting Started instructions at http://wiki.apache.org/hadoop/Hive/GettingStarted are incorrect. They claim that you should download a dist-17.tar.gz file from a Facebook mirror. This link is 404, and Facebook does not seem to maintain a publicly available Hive package at any other location I can find. Thus, the wiki should be updated to instruct users to checkout/export files from SVN. (This page is locked, so I can't change it myself) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-835) Deprecate, remove, or fix MAP and REDUCE syntax.
[ https://issues.apache.org/jira/browse/HIVE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-835: - Summary: Deprecate, remove, or fix MAP and REDUCE syntax. (was: Make MAP and REDUCE work as expected or add warnings) Deprecate, remove, or fix MAP and REDUCE syntax. Key: HIVE-835 URL: https://issues.apache.org/jira/browse/HIVE-835 Project: Hive Issue Type: Improvement Reporter: Adam Kramer There are syntactic elements MAP and REDUCE which function as syntactic sugar for SELECT TRANSFORM. This behavior is not at all intuitive, because no checking or verification is done to ensure that the user's intention is met. Specifically, Hive may see a MAP query and simply tack the transform script on to the end of a reduce job (so, the user says MAP but hive does a REDUCE), or (more dangerously) vice-versa. Given that Hive's whole point is to sit on top of a mapreduce framework and allow transformations in the mapper or reducer, it seems very inappropriate for Hive to ignore a clear command from the user to MAP or to REDUCE the data using a script, and then simply ignore it. Better behavior would be for hive to see a MAP command and to start a new mapreduce step and run the command in the mapper (even if it otherwise would be run in the reducer), and for REDUCE to begin a reduce step if necessary (so, tack the REDUCE script on to the end of a REDUCE job if the current system would do so, or if not, treat the 0th column as the reduce key, throw a warning saying this has been done, and force a reduce job). Acceptable behavior would be to throw an error or warning when the user's clearly-stated desire is going to be ignored. Warning: User used MAP keyword, but transformation will occur in the reduce phase / Warning: User used REDUCE keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. Transformation will occur in the map phase. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-211) Add metastore_db to svn ignore
[ https://issues.apache.org/jira/browse/HIVE-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-211: Fix Version/s: 0.3.0 Add metastore_db to svn ignore -- Key: HIVE-211 URL: https://issues.apache.org/jira/browse/HIVE-211 Project: Hive Issue Type: Task Reporter: Johan Oskarsson Assignee: Zheng Shao Priority: Trivial Fix For: 0.3.0 As per HIVE-101 add the metastore_db directory to svn ignore since it shouldn't be committed or added to any patches. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-835) Deprecate, remove, or fix MAP and REDUCE syntax.
[ https://issues.apache.org/jira/browse/HIVE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-835: Component/s: SQL Deprecate, remove, or fix MAP and REDUCE syntax. Key: HIVE-835 URL: https://issues.apache.org/jira/browse/HIVE-835 Project: Hive Issue Type: Improvement Components: SQL Reporter: Adam Kramer There are syntactic elements MAP and REDUCE which function as syntactic sugar for SELECT TRANSFORM. This behavior is not at all intuitive, because no checking or verification is done to ensure that the user's intention is met. Specifically, Hive may see a MAP query and simply tack the transform script on to the end of a reduce job (so, the user says MAP but hive does a REDUCE), or (more dangerously) vice-versa. Given that Hive's whole point is to sit on top of a mapreduce framework and allow transformations in the mapper or reducer, it seems very inappropriate for Hive to ignore a clear command from the user to MAP or to REDUCE the data using a script, and then simply ignore it. Better behavior would be for hive to see a MAP command and to start a new mapreduce step and run the command in the mapper (even if it otherwise would be run in the reducer), and for REDUCE to begin a reduce step if necessary (so, tack the REDUCE script on to the end of a REDUCE job if the current system would do so, or if not, treat the 0th column as the reduce key, throw a warning saying this has been done, and force a reduce job). Acceptable behavior would be to throw an error or warning when the user's clearly-stated desire is going to be ignored. Warning: User used MAP keyword, but transformation will occur in the reduce phase / Warning: User used REDUCE keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. Transformation will occur in the map phase. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071426#comment-13071426 ] Hudson commented on HIVE-2226: -- Integrated in Hive-trunk-h0.21 #851 (See [https://builds.apache.org/job/Hive-trunk-h0.21/851/]) HIVE-2226. Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. (Sohan Jain via pauly) pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1151213 Files : * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Constants.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java * /hive/trunk/metastore/if/hive_metastore.thrift * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote * /hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java * /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py * /hive/trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb * /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h * /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/constants.py * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java * /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java * /hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java * /hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-150) group by count(1) will get error
[ https://issues.apache.org/jira/browse/HIVE-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-150: Fix Version/s: 0.3.0 group by count(1) will get error Key: HIVE-150 URL: https://issues.apache.org/jira/browse/HIVE-150 Project: Hive Issue Type: Bug Components: Build Infrastructure Environment: HADOOP 0.17.2.1 Reporter: YihueyChyi Fix For: 0.3.0 Attachments: hive-150.1.patch HIVEQL: select l.http_user_agent,count(1) from log_resume_all l group by l.http_user_agent Maybe I'll get error in the second stage: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver The second stage : map error java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:151) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:250) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:174) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) syslog logs 2008-12-10 15:41:15,209 DEBUG org.apache.hadoop.mapred.TaskTracker: Child starting 2008-12-10 15:41:15,717 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2008-12-10 15:41:15,805 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 64 2008-12-10 15:41:16,252 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2008-12-10 15:41:16,253 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded initialized native-zlib library 2008-12-10 15:41:16,424 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initializing Self 2008-12-10 15:41:16,428 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias /tmp/hive-root/462573742/46102483.10002 to work list for file /tmp/hive-root/462573742/46102483.10002/0015_r_29_0 2008-12-10 15:41:16,438 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Got partitions: null 2008-12-10 15:41:16,438 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initializing Self 2008-12-10 15:41:16,443 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Using tag = -1 2008-12-10 15:41:16,460 INFO org.apache.hadoop.hive.serde2.thrift.TBinarySortableProtocol: Sort order is 2008-12-10 15:41:16,460 INFO org.apache.hadoop.hive.serde2.thrift.TBinarySortableProtocol: Sort order is 2008-12-10 15:41:16,489 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 2008-12-10 15:41:16,495 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:151) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:250) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:174) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2286: ClassCastException when building index with security.authorization turned on
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1137/ --- (Updated 2011-07-26 23:28:13.279889) Review request for hive, John Sichi and Ning Zhang. Changes --- refactor patch to dump query state into an inner class rather than a Stack. Summary --- Save the original HiveOperation/commandType when we generate the index builder task and restore it after we're done generating the task so that the authorization checks make the right decision when deciding what to do. This addresses bug HIVE-2286. https://issues.apache.org/jira/browse/HIVE-2286 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION ql/src/test/results/clientnegative/addpart1.q.out f4da8f1 ql/src/test/results/clientnegative/alter_concatenate_indexed_table.q.out 8ae1f9d ql/src/test/results/clientnegative/alter_non_native.q.out 8be2c3b ql/src/test/results/clientnegative/alter_view_failure.q.out 9954b66 ql/src/test/results/clientnegative/alter_view_failure2.q.out 5915b4f ql/src/test/results/clientnegative/alter_view_failure4.q.out 97d6b18 ql/src/test/results/clientnegative/alter_view_failure5.q.out 2291ca6 ql/src/test/results/clientnegative/alter_view_failure6.q.out 03b2bc3 ql/src/test/results/clientnegative/alter_view_failure7.q.out d0f958c ql/src/test/results/clientnegative/alter_view_failure8.q.out 4420c57 ql/src/test/results/clientnegative/alter_view_failure9.q.out 67306d3 ql/src/test/results/clientnegative/altern1.q.out c52ca04 ql/src/test/results/clientnegative/analyze_view.q.out 99def40 ql/src/test/results/clientnegative/archive1.q.out 0927686 ql/src/test/results/clientnegative/archive2.q.out 25baefa ql/src/test/results/clientnegative/authorization_fail_1.q.out ab1abe2 ql/src/test/results/clientnegative/authorization_fail_3.q.out cd7ceb1 ql/src/test/results/clientnegative/authorization_fail_4.q.out b05f9b7 ql/src/test/results/clientnegative/authorization_fail_5.q.out f5bdc6a ql/src/test/results/clientnegative/authorization_fail_7.q.out a52fd1c ql/src/test/results/clientnegative/authorization_part.q.out 625d60c ql/src/test/results/clientnegative/column_rename1.q.out 7c30e4e ql/src/test/results/clientnegative/column_rename2.q.out 0ca78f9 ql/src/test/results/clientnegative/column_rename4.q.out f14fd48 ql/src/test/results/clientnegative/create_or_replace_view1.q.out 97bfa21 ql/src/test/results/clientnegative/create_or_replace_view2.q.out 8edac34 ql/src/test/results/clientnegative/create_or_replace_view4.q.out 89dd5f5 ql/src/test/results/clientnegative/create_or_replace_view5.q.out a0aed59 ql/src/test/results/clientnegative/create_or_replace_view6.q.out df44e33 ql/src/test/results/clientnegative/create_or_replace_view7.q.out 9356dcc ql/src/test/results/clientnegative/create_or_replace_view8.q.out 4161659 ql/src/test/results/clientnegative/create_view_failure1.q.out 43cded4 ql/src/test/results/clientnegative/create_view_failure2.q.out a038067 ql/src/test/results/clientnegative/create_view_failure4.q.out f968569 ql/src/test/results/clientnegative/database_create_already_exists.q.out 08c04f9 ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1e58089 ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 80c00cd ql/src/test/results/clientnegative/database_drop_not_empty.q.out baa8f37 ql/src/test/results/clientnegative/database_drop_not_empty_restrict.q.out b297a99 ql/src/test/results/clientnegative/database_switch_does_not_exist.q.out 8b5674d ql/src/test/results/clientnegative/drop_partition_failure.q.out 8a7c63d ql/src/test/results/clientnegative/drop_table_failure2.q.out 9b63102 ql/src/test/results/clientnegative/drop_view_failure1.q.out 61ec927 ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 814b742 ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 0351bc1 ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out d40ff27 ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out adff0f8 ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out b84e954 ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 96f8452 ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 25deaa3 ql/src/test/results/clientnegative/exim_07_nonpart_noncompat_ifof.q.out f9c3d5a ql/src/test/results/clientnegative/exim_08_nonpart_noncompat_serde.q.out 12c737a ql/src/test/results/clientnegative/exim_09_nonpart_noncompat_serdeparam.q.out 77afe3a
[jira] [Updated] (HIVE-2286) ClassCastException when building index with security.authorization turned on
[ https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed S. Albiz updated HIVE-2286: Attachment: HIVE-2286.6.patch ClassCastException when building index with security.authorization turned on Key: HIVE-2286 URL: https://issues.apache.org/jira/browse/HIVE-2286 Project: Hive Issue Type: Bug Reporter: Syed S. Albiz Assignee: Syed S. Albiz Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch, HIVE-2286.6.patch When trying to build an index with authorization checks turned on, hive issues the following ClassCastException: org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to org.apache.hadoop.hive.ql.parse.SemanticAnalyzer at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:567) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2286) ClassCastException when building index with security.authorization turned on
[ https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed S. Albiz updated HIVE-2286: Status: Patch Available (was: Open) ClassCastException when building index with security.authorization turned on Key: HIVE-2286 URL: https://issues.apache.org/jira/browse/HIVE-2286 Project: Hive Issue Type: Bug Reporter: Syed S. Albiz Assignee: Syed S. Albiz Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch, HIVE-2286.6.patch When trying to build an index with authorization checks turned on, hive issues the following ClassCastException: org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to org.apache.hadoop.hive.ql.parse.SemanticAnalyzer at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:567) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2286) ClassCastException when building index with security.authorization turned on
[ https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071428#comment-13071428 ] jirapos...@reviews.apache.org commented on HIVE-2286: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1137/ --- (Updated 2011-07-26 23:28:13.279889) Review request for hive, John Sichi and Ning Zhang. Changes --- refactor patch to dump query state into an inner class rather than a Stack. Summary --- Save the original HiveOperation/commandType when we generate the index builder task and restore it after we're done generating the task so that the authorization checks make the right decision when deciding what to do. This addresses bug HIVE-2286. https://issues.apache.org/jira/browse/HIVE-2286 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION ql/src/test/results/clientnegative/addpart1.q.out f4da8f1 ql/src/test/results/clientnegative/alter_concatenate_indexed_table.q.out 8ae1f9d ql/src/test/results/clientnegative/alter_non_native.q.out 8be2c3b ql/src/test/results/clientnegative/alter_view_failure.q.out 9954b66 ql/src/test/results/clientnegative/alter_view_failure2.q.out 5915b4f ql/src/test/results/clientnegative/alter_view_failure4.q.out 97d6b18 ql/src/test/results/clientnegative/alter_view_failure5.q.out 2291ca6 ql/src/test/results/clientnegative/alter_view_failure6.q.out 03b2bc3 ql/src/test/results/clientnegative/alter_view_failure7.q.out d0f958c ql/src/test/results/clientnegative/alter_view_failure8.q.out 4420c57 ql/src/test/results/clientnegative/alter_view_failure9.q.out 67306d3 ql/src/test/results/clientnegative/altern1.q.out c52ca04 ql/src/test/results/clientnegative/analyze_view.q.out 99def40 ql/src/test/results/clientnegative/archive1.q.out 0927686 ql/src/test/results/clientnegative/archive2.q.out 25baefa ql/src/test/results/clientnegative/authorization_fail_1.q.out ab1abe2 ql/src/test/results/clientnegative/authorization_fail_3.q.out cd7ceb1 ql/src/test/results/clientnegative/authorization_fail_4.q.out b05f9b7 ql/src/test/results/clientnegative/authorization_fail_5.q.out f5bdc6a ql/src/test/results/clientnegative/authorization_fail_7.q.out a52fd1c ql/src/test/results/clientnegative/authorization_part.q.out 625d60c ql/src/test/results/clientnegative/column_rename1.q.out 7c30e4e ql/src/test/results/clientnegative/column_rename2.q.out 0ca78f9 ql/src/test/results/clientnegative/column_rename4.q.out f14fd48 ql/src/test/results/clientnegative/create_or_replace_view1.q.out 97bfa21 ql/src/test/results/clientnegative/create_or_replace_view2.q.out 8edac34 ql/src/test/results/clientnegative/create_or_replace_view4.q.out 89dd5f5 ql/src/test/results/clientnegative/create_or_replace_view5.q.out a0aed59 ql/src/test/results/clientnegative/create_or_replace_view6.q.out df44e33 ql/src/test/results/clientnegative/create_or_replace_view7.q.out 9356dcc ql/src/test/results/clientnegative/create_or_replace_view8.q.out 4161659 ql/src/test/results/clientnegative/create_view_failure1.q.out 43cded4 ql/src/test/results/clientnegative/create_view_failure2.q.out a038067 ql/src/test/results/clientnegative/create_view_failure4.q.out f968569 ql/src/test/results/clientnegative/database_create_already_exists.q.out 08c04f9 ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1e58089 ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 80c00cd ql/src/test/results/clientnegative/database_drop_not_empty.q.out baa8f37 ql/src/test/results/clientnegative/database_drop_not_empty_restrict.q.out b297a99 ql/src/test/results/clientnegative/database_switch_does_not_exist.q.out 8b5674d ql/src/test/results/clientnegative/drop_partition_failure.q.out 8a7c63d ql/src/test/results/clientnegative/drop_table_failure2.q.out 9b63102 ql/src/test/results/clientnegative/drop_view_failure1.q.out 61ec927 ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 814b742 ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 0351bc1 ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out d40ff27 ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out adff0f8 ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out b84e954 ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 96f8452 ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 25deaa3
[jira] [Reopened] (HIVE-401) Reduce the ant test time to under 15 minutes
[ https://issues.apache.org/jira/browse/HIVE-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reopened HIVE-401: - Yesterday it took me 4 hours to run the tests on trunk. Reduce the ant test time to under 15 minutes Key: HIVE-401 URL: https://issues.apache.org/jira/browse/HIVE-401 Project: Hive Issue Type: Wish Reporter: Zheng Shao Assignee: Zheng Shao Attachments: hive_parallel_test.sh ant test is taking too long. This is a big overhead for development since we need to do context switching all the time. We should bring the time back to under 15 minutes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-494) Select columns by index instead of name
[ https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-494: - Description: SELECT mytable[0], mytable[2] FROM some_table_name mytable; ...should return the first and third columns, respectively, from mytable regardless of their column names. The need for names specifically is kind of silly when they just get translated into numbers anyway. was: In a very real sense, tables are like arrays or matrices with rows and columns. IT would be fantastic if I could refer to columns in my select statement by their index, rather than by their name. SELECT mytable[0], mytable[2] FROM some_table_name mytable; ...which would then get the first and third column from mytable. We already have syntax like this for array data types, which I think would translate nicely: SELECT mytable[0][3], etc. Or maybe I just spend too much time coding in R... Priority: Minor (was: Major) Summary: Select columns by index instead of name (was: Select columns by number instead of name) Select columns by index instead of name --- Key: HIVE-494 URL: https://issues.apache.org/jira/browse/HIVE-494 Project: Hive Issue Type: Wish Components: Clients, Query Processor Reporter: Adam Kramer Priority: Minor Labels: SQL SELECT mytable[0], mytable[2] FROM some_table_name mytable; ...should return the first and third columns, respectively, from mytable regardless of their column names. The need for names specifically is kind of silly when they just get translated into numbers anyway. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2204) unable to get column names for a specific table that has '_' as part of its table name
[ https://issues.apache.org/jira/browse/HIVE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2204: - Fix Version/s: 0.8.0 unable to get column names for a specific table that has '_' as part of its table name -- Key: HIVE-2204 URL: https://issues.apache.org/jira/browse/HIVE-2204 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Mythili Gopalakrishnan Assignee: Patrick Hunt Fix For: 0.8.0 Attachments: HIVE-2204.patch I have a table age_group and I am trying to get list of columns for this table name. As underscore and '%' have special meaning in table search pattern according to JDBC searchPattern string specification, I escape the '_' in my table name when I call getColumns for this single table. But HIVE does not return any columns. My call to getColumns is as follows catalog null schemaPattern % tableNamePattern age\_group columnNamePattern % If I don't escape the '_' in my tableNamePattern, I am able to get the list of columns. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if possible
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Summary: Comparison Operators convert number types to common type instead of double if possible (was: Comparison Operators convert number types to common type instead of double if necessary) Comparison Operators convert number types to common type instead of double if possible -- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Siying Dong Assignee: Siying Dong Fix For: 0.8.0 Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-2046) In error scenario some opened streams may not closed in Utilities.java
[ https://issues.apache.org/jira/browse/HIVE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reopened HIVE-2046: -- In error scenario some opened streams may not closed in Utilities.java -- Key: HIVE-2046 URL: https://issues.apache.org/jira/browse/HIVE-2046 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2046.Patch 1) In error scenario XMLDecoder XMLEncoder may not be closed in serializeMapRedWork() and deserializeMapRedWork() Utilities.java 2) BufferedReader is not closed in Utilities.StreamPrinter -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2046) In error scenario some opened streams may not closed in Utilities.java
[ https://issues.apache.org/jira/browse/HIVE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-2046. -- Resolution: Duplicate In error scenario some opened streams may not closed in Utilities.java -- Key: HIVE-2046 URL: https://issues.apache.org/jira/browse/HIVE-2046 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2046.Patch 1) In error scenario XMLDecoder XMLEncoder may not be closed in serializeMapRedWork() and deserializeMapRedWork() Utilities.java 2) BufferedReader is not closed in Utilities.StreamPrinter -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2051: - Fix Version/s: 0.8.0 getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch, HIVE-2051.5.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-2044) In error scenario opened streams may not closed in TypedBytesWritableOutput.java
[ https://issues.apache.org/jira/browse/HIVE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reopened HIVE-2044: -- In error scenario opened streams may not closed in TypedBytesWritableOutput.java Key: HIVE-2044 URL: https://issues.apache.org/jira/browse/HIVE-2044 Project: Hive Issue Type: Bug Components: Contrib Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2044.Patch 1) In error scenario DataOutputStream may not be closed in writeWritable of TypedBytesWritableOutput.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1937) DDLSemanticAnalyzer won't take newly set Hive parameters
[ https://issues.apache.org/jira/browse/HIVE-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1937: - Component/s: Query Processor Fix Version/s: 0.8.0 DDLSemanticAnalyzer won't take newly set Hive parameters Key: HIVE-1937 URL: https://issues.apache.org/jira/browse/HIVE-1937 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-1937.2.patch, HIVE-1937.3.patch, HIVE-1937.patch Hive DDLSemanticAnalyzer maintains a static reservedPartitionValue set whose values come from several Hive parameters. However even if these parameters are set to new values, the reservedPartitionValue are not changed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-1890) Optimize privilege checking for authorization
[ https://issues.apache.org/jira/browse/HIVE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reopened HIVE-1890: -- Optimize privilege checking for authorization - Key: HIVE-1890 URL: https://issues.apache.org/jira/browse/HIVE-1890 Project: Hive Issue Type: Improvement Components: Security Reporter: Namit Jain Assignee: He Yongqiang Follow-up of HIVE-78 There are many queries which have lots of input partitions for the same input table. If the table under consideration has the same privilege for all the partitions, you dont need to check the permissions for all the partitions. You can find the common tables and skip the partitions altogether -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-1890) Optimize privilege checking for authorization
[ https://issues.apache.org/jira/browse/HIVE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-1890. -- Resolution: Duplicate Optimize privilege checking for authorization - Key: HIVE-1890 URL: https://issues.apache.org/jira/browse/HIVE-1890 Project: Hive Issue Type: Improvement Components: Security Reporter: Namit Jain Assignee: He Yongqiang Follow-up of HIVE-78 There are many queries which have lots of input partitions for the same input table. If the table under consideration has the same privilege for all the partitions, you dont need to check the permissions for all the partitions. You can find the common tables and skip the partitions altogether -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1644: - Fix Version/s: 0.8.0 use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: John Sichi Assignee: Russell Melick Fix For: 0.8.0 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, HIVE-1644.17.patch, HIVE-1644.18.patch, HIVE-1644.19.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch, hive.log HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1595: - Fix Version/s: 0.8.0 job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: Hive-1595.1.patch, Hive-1595.2.patch For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-1490) More implicit type conversion: UNION ALL and COALESCE
[ https://issues.apache.org/jira/browse/HIVE-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reopened HIVE-1490: -- More implicit type conversion: UNION ALL and COALESCE - Key: HIVE-1490 URL: https://issues.apache.org/jira/browse/HIVE-1490 Project: Hive Issue Type: Bug Components: Query Processor, Server Infrastructure Reporter: Adam Kramer Assignee: Syed S. Albiz This is a usecase that frequently annoys me: SELECT TRANSFORM(stuff) USING 'script' AS thing1, thing2 FROM some_table UNION ALL SELECT a.thing1, a.thing2 FROM some_other_table a ...this fails when a.thing1 and a.thing2 are anything but STRING, because all output of TRANSFORM is STRING. In this case, a.thing1 and a.thing2 should be implicitly converted to string. COALESCE(a.thing1, a.thing2, a.thing3) should similarly do implicit type conversion among the arguments. If two are INT and one is BIGINT, upgrade the INTs, etc. At the very least, it would be nice to have syntax like SELECT TRANSFORM(stuff) USING 'script' AS thing1 INT, thing2 INT ...which would effectively cast the output column to the specified type. But really, type conversion should work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2199) incorrect success flag passed to jobClose
[ https://issues.apache.org/jira/browse/HIVE-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2199: - Fix Version/s: 0.8.0 incorrect success flag passed to jobClose - Key: HIVE-2199 URL: https://issues.apache.org/jira/browse/HIVE-2199 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Franklin Hu Assignee: Franklin Hu Priority: Minor Fix For: 0.8.0 Attachments: hive-2199.1.patch For block level merging of RCFiles, jobClose is passed the incorrect variable as the success flag -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2024) In Driver.execute(), mapred.job.tracker is not restored if one of the task fails.
[ https://issues.apache.org/jira/browse/HIVE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2024: - Component/s: Query Processor Fix Version/s: 0.8.0 In Driver.execute(), mapred.job.tracker is not restored if one of the task fails. - Key: HIVE-2024 URL: https://issues.apache.org/jira/browse/HIVE-2024 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Siying Dong Assignee: Siying Dong Fix For: 0.8.0 Attachments: HIVE-2024.1.patch If automatically one job is determined to run in local mode, and the task fails with error code not 0, mapred.job.tracker will remain to be local and might cause further problems. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2052) PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary
[ https://issues.apache.org/jira/browse/HIVE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2052: - Fix Version/s: 0.8.0 PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary -- Key: HIVE-2052 URL: https://issues.apache.org/jira/browse/HIVE-2052 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2051.3.patch, HIVE-2052.1.patch, HIVE-2052.2.patch, HIVE-2052.3.patch This will allow hooks to share some information better and reduce their latency -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job
[ https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2082: - Component/s: Query Processor Fix Version/s: 0.8.0 Reduce memory consumption in preparing MapReduce job Key: HIVE-2082 URL: https://issues.apache.org/jira/browse/HIVE-2082 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch Hive client side consume a lot of memory when the number of input partitions is large. One reason is that each partition maintains a list of FieldSchema which are intended to deal with schema evolution. However they are not used currently and Hive uses the table level schema for all partitions. This will be fixed in HIVE-2050. The memory consumption by this part will be reduced by almost half (1.2GB to 700BM for 20k partitions). Another large chunk of memory consumption is in the MapReduce job setup phase when a PartitionDesc is created from each Partition object. A property object is maintained in PartitionDesc which contains a full list of columns and types. Due to the same reason, these should be the same as in the table level schema. Also the deserializer initialization takes large amount of memory, which should be avoided. My initial testing for these optimizations cut the memory consumption in half (700MB to 300MB for 20k partitions). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-178) SELECT without FROM should assume a one-row table with no columns.
[ https://issues.apache.org/jira/browse/HIVE-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kramer updated HIVE-178: - Component/s: Testing Infrastructure Description: SELECT 1+1; should just return '2', but instead hive fails because no table is listed. SELECT 1+1 FROM (empty table); should also just return '2', but instead hive succeeds because there is no possible output, so it produces no output. So, currently we have to run SELECT 1+1 FROM (silly one-row dummy table); ...which runs a whole mapreduce step to ignore a column of data that is useless anyway. This is much easier due to local mode, but still, it would be nice to be able to SELECT without specifying a table and to get one row of output in moments instead of waiting for even a local-mode job to launch, complete, and return. This is especially useful for testing UDFs. Relatedly, an optimization by which Hive can tell that data from a table isn't even USED would be useful, because it means that the data needn't be queried...the only relevant info from the table would be the number of rows it has, which is available for free from the metastore. was: SELECT 1+1; should just return '2', but instead hive fails because no table is listed. SELECT 1+1 FROM (empty table); should also just return '2', but instead hive succeeds because there is no possible output, so it produces no output. So, currently we have to run SELECT 1+1 FROM (silly one-row dummy table); ...which runs a whole mapreduce step to ignore a column of data that is useless anyway. This is much easier due to local mode, but still, it would be nice to be able to SELECT without specifying a table and to get one row of output in moments instead of waiting for even a local-mode job to launch, complete, and return. Relatedly, an optimization by which Hive can tell that data from a table isn't even USED would be useful, because it means that the data needn't be queried...the only relevant info from the table would be the number of rows it has, which is available for free from the metastore. SELECT without FROM should assume a one-row table with no columns. -- Key: HIVE-178 URL: https://issues.apache.org/jira/browse/HIVE-178 Project: Hive Issue Type: Wish Components: Query Processor, Testing Infrastructure Reporter: Adam Kramer Priority: Minor Labels: SQL SELECT 1+1; should just return '2', but instead hive fails because no table is listed. SELECT 1+1 FROM (empty table); should also just return '2', but instead hive succeeds because there is no possible output, so it produces no output. So, currently we have to run SELECT 1+1 FROM (silly one-row dummy table); ...which runs a whole mapreduce step to ignore a column of data that is useless anyway. This is much easier due to local mode, but still, it would be nice to be able to SELECT without specifying a table and to get one row of output in moments instead of waiting for even a local-mode job to launch, complete, and return. This is especially useful for testing UDFs. Relatedly, an optimization by which Hive can tell that data from a table isn't even USED would be useful, because it means that the data needn't be queried...the only relevant info from the table would be the number of rows it has, which is available for free from the metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2096) throw a error if the input is larger than a threshold for index input format
[ https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2096: - Component/s: Query Processor Diagnosability Fix Version/s: 0.8.0 throw a error if the input is larger than a threshold for index input format Key: HIVE-2096 URL: https://issues.apache.org/jira/browse/HIVE-2096 Project: Hive Issue Type: Bug Components: Diagnosability, Query Processor Affects Versions: 0.8.0 Reporter: Namit Jain Fix For: 0.8.0 Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt This can hang for ever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2096) throw a error if the input is larger than a threshold for index input format
[ https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reassigned HIVE-2096: Assignee: Wojciech Galuba throw a error if the input is larger than a threshold for index input format Key: HIVE-2096 URL: https://issues.apache.org/jira/browse/HIVE-2096 Project: Hive Issue Type: Bug Components: Diagnosability, Query Processor Affects Versions: 0.8.0 Reporter: Namit Jain Assignee: Wojciech Galuba Fix For: 0.8.0 Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt This can hang for ever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2106) Increase the number of operator counter
[ https://issues.apache.org/jira/browse/HIVE-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2106: - Fix Version/s: 0.8.0 Increase the number of operator counter Key: HIVE-2106 URL: https://issues.apache.org/jira/browse/HIVE-2106 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2106.patch Currently Hadoop counters have to be defined as enum (hardcoded) and we support up to 400 counters now. This limit the number of operators to 100 (each operator has 4 counters). We need to increase the hadoop counters or change the Hive code to use Hadoop 0.20 API. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus
[ https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2186: - Fix Version/s: 0.8.0 Dynamic Partitioning Failing because of characters not supported globStatus --- Key: HIVE-2186 URL: https://issues.apache.org/jira/browse/HIVE-2186 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Siying Dong Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch, hive-2186.4.patch, hive-2186.5.patch Some dynamic queries failed on the stage of loading partitions if dynamic partition columns contain special characters. We need to escape all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2159) TableSample(percent ) uses one intermediate size to be int, which overflows for large sampled size, making the sampling never triggered.
[ https://issues.apache.org/jira/browse/HIVE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2159: - Component/s: Query Processor Fix Version/s: 0.8.0 TableSample(percent ) uses one intermediate size to be int, which overflows for large sampled size, making the sampling never triggered. Key: HIVE-2159 URL: https://issues.apache.org/jira/browse/HIVE-2159 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Siying Dong Assignee: Siying Dong Fix For: 0.8.0 Attachments: HIVE-2159.1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2121) Input Sampling By Splits
[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2121: - Component/s: Query Processor Fix Version/s: 0.8.0 Input Sampling By Splits Key: HIVE-2121 URL: https://issues.apache.org/jira/browse/HIVE-2121 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Siying Dong Assignee: Siying Dong Fix For: 0.8.0 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch, HIVE-2121.7.patch, HIVE-2121.8.patch We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2157) NPE in MapJoinObjectKey
[ https://issues.apache.org/jira/browse/HIVE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2157: - Component/s: Query Processor Fix Version/s: 0.8.0 NPE in MapJoinObjectKey --- Key: HIVE-2157 URL: https://issues.apache.org/jira/browse/HIVE-2157 Project: Hive Issue Type: Bug Components: Query Processor Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.8.0 Attachments: HIVE-2157.1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2262) mapjoin followed by union all, groupby does not work
[ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2262: - Fix Version/s: (was: 0.7.1) mapjoin followed by union all, groupby does not work Key: HIVE-2262 URL: https://issues.apache.org/jira/browse/HIVE-2262 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: yu xiang Priority: Trivial sql: CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2; exception: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156) at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551) at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514) at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125) at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76) at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) Analyse the reason: 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and set up the UnionParseContext. 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan. 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process() to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also the utask is processing temporary table, there is no topOp map to table.So here we get null exception. Solutions: 1.SQL solution:use a sub query to modify the sql; 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan. if (uCtx.isMapOnlySubq()!upc.isIssetTaskPlan()) I don't know whether the code solution is suitable. Is there any better solution? thx -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2306) Hbase's timestamp attribute to be mapped for read or write, and then import data of timestamp to hbase's table from hive
[ https://issues.apache.org/jira/browse/HIVE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianyi Zhang updated HIVE-2306: --- Description: Current column mapping dosn't support hbase's timestamp column to be mapped for read or write, and import data of timestamp to hbase's table from hive. I find HIVE-1228 mentioned this issue,but not to address the :timestamp requirement at last. And https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration said that there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp. Would allow timestamp to be map to hive(just like Get in hbase API) or INSERT OVERWRITE TABLE hbase_table_1 with timestamp from hive(like Put in hbase API)? was: Current column mapping dosn't support hbase's timestamp column to be mapped for read or write, and import data of timestamp to hbase's table from hive. I find HIVE-1228 mentioned this issue,but not to address the :timestamp requirement at last. And https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration said that there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp. This would allow timestamp to be map to hive(just like Gut in hbase API) or INSERT OVERWRITE TABLE hbase_table_1 with timestamp from hive(like Put in hbase API)? Hbase's timestamp attribute to be mapped for read or write, and then import data of timestamp to hbase's table from hive Key: HIVE-2306 URL: https://issues.apache.org/jira/browse/HIVE-2306 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Jianyi Zhang Original Estimate: 96h Remaining Estimate: 96h Current column mapping dosn't support hbase's timestamp column to be mapped for read or write, and import data of timestamp to hbase's table from hive. I find HIVE-1228 mentioned this issue,but not to address the :timestamp requirement at last. And https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration said that there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp. Would allow timestamp to be map to hive(just like Get in hbase API) or INSERT OVERWRITE TABLE hbase_table_1 with timestamp from hive(like Put in hbase API)? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira