[jira] [Updated] (HIVE-2307) Schema creation scripts for PostgreSQL use bit(1) instead of boolean

2011-07-26 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HIVE-2307:


Attachment: HIVE-2307.1.patch.txt

 Schema creation scripts for PostgreSQL use bit(1) instead of boolean
 

 Key: HIVE-2307
 URL: https://issues.apache.org/jira/browse/HIVE-2307
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Affects Versions: 0.5.0, 0.6.0, 0.7.0, 0.7.1
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
  Labels: metastore, postgres
 Attachments: HIVE-2307.1.patch.txt


 The specified type for DEFERRED_REBUILD (IDXS) and IS_COMPRESSED (SDS) 
 columns in the metastore is defined as bit(1) type which is not supported by 
 PostgreSQL JDBC.
 hive create table test (id int); 
 FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object 
 org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4f1adeb7 using 
 statement INSERT INTO SDS 
 (SD_ID,INPUT_FORMAT,OUTPUT_FORMAT,LOCATION,SERDE_ID,NUM_BUCKETS,IS_COMPRESSED)
  VALUES (?,?,?,?,?,?,?) failed : ERROR: column IS_COMPRESSED is of type 
 bit but expression is of type boolean 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2307) Schema creation scripts for PostgreSQL use bit(1) instead of boolean

2011-07-26 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HIVE-2307:


Status: Patch Available  (was: Open)

 Schema creation scripts for PostgreSQL use bit(1) instead of boolean
 

 Key: HIVE-2307
 URL: https://issues.apache.org/jira/browse/HIVE-2307
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Affects Versions: 0.7.1, 0.7.0, 0.6.0, 0.5.0
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
  Labels: metastore, postgres
 Attachments: HIVE-2307.1.patch.txt


 The specified type for DEFERRED_REBUILD (IDXS) and IS_COMPRESSED (SDS) 
 columns in the metastore is defined as bit(1) type which is not supported by 
 PostgreSQL JDBC.
 hive create table test (id int); 
 FAILED: Error in metadata: javax.jdo.JDODataStoreException: Insert of object 
 org.apache.hadoop.hive.metastore.model.MStorageDescriptor@4f1adeb7 using 
 statement INSERT INTO SDS 
 (SD_ID,INPUT_FORMAT,OUTPUT_FORMAT,LOCATION,SERDE_ID,NUM_BUCKETS,IS_COMPRESSED)
  VALUES (?,?,?,?,?,?,?) failed : ERROR: column IS_COMPRESSED is of type 
 bit but expression is of type boolean 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-1850) alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?)

2011-07-26 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu reassigned HIVE-1850:
-

Assignee: Amareshwari Sriramadasu

 alter table set serdeproperties bypasses regexps checks (leaves table in a 
 non-recoverable state?)
 --

 Key: HIVE-1850
 URL: https://issues.apache.org/jira/browse/HIVE-1850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Trunk build from a few days ago, but seen once before 
 with older version as well.
Reporter: Terje Marthinussen
Assignee: Amareshwari Sriramadasu

 {code}
 create table aa ( test STRING )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   WITH SERDEPROPERTIES (input.regex = [^\\](.*), output.format.string = 
 $1s);
 {code}
 This will fail. Great!
 {code}
 create table aa ( test STRING )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   WITH SERDEPROPERTIES (input.regex = (.*), output.format.string = 
 $1s);
 {code}
 Works, no problem there.
 {code}
 alter table aa set serdeproperties (input.regex = [^\\](.*), 
 output.format.string = $1s);
 {code}
 Wups... I can set that without any problems!
 {code}
 alter table aa set serdeproperties (input.regex = (.*), 
 output.format.string = $1s);
 FAILED: Hive Internal Error: java.util.regex.PatternSyntaxException(Unclosed 
 character class near index 7
 [^\](.*)
^)
 java.util.regex.PatternSyntaxException: Unclosed character class near index 7
 [^\](.*)
^
   at java.util.regex.Pattern.error(Pattern.java:1713)
   at java.util.regex.Pattern.clazz(Pattern.java:2254)
   at java.util.regex.Pattern.sequence(Pattern.java:1818)
   at java.util.regex.Pattern.expr(Pattern.java:1752)
   at java.util.regex.Pattern.compile(Pattern.java:1460)
   at java.util.regex.Pattern.init(Pattern.java:1133)
   at java.util.regex.Pattern.compile(Pattern.java:847)
   at 
 org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:101)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:199)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:484)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:161)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:803)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerdeProps(DDLSemanticAnalyzer.java:558)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:232)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:686)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:142)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:370)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {code}
 After this, all further commands on the table fails, including drop table :)
 1. The alter table command should probably check the regexp just like the 
 create table command does
 2. Even though the regexp is bad, it should be possible to do things like set 
 the regexp again or drop the table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-956) Add support of columnar binary serde

2011-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071016#comment-13071016
 ] 

Hudson commented on HIVE-956:
-

Integrated in Hive-trunk-h0.21 #849 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/849/])
HIVE-956: add support of columnar binary serde (Krishna Kumar via He 
Yongqiang)

heyongqiang : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1150978
Files : 
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDeBase.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/columnar
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java


 Add support of columnar binary serde
 

 Key: HIVE-956
 URL: https://issues.apache.org/jira/browse/HIVE-956
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: Krishna Kumar
 Attachments: HIVE-956v3.patch, HIVE-956v4.patch, HIVE.956.patch.0, 
 HIVE.956.patch.1, HIVE.956.patch.2




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1850) alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?)

2011-07-26 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1850:
--

Attachment: patch-1850.txt

Even though, DDLTask.alterTable() does a checkValidity for the table after all 
the alterations, this problem is not found. Because getDeserializer() was not 
getting it from the Metastore with modified properties. 
Patch does the required change and add regression test. 

 alter table set serdeproperties bypasses regexps checks (leaves table in a 
 non-recoverable state?)
 --

 Key: HIVE-1850
 URL: https://issues.apache.org/jira/browse/HIVE-1850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Trunk build from a few days ago, but seen once before 
 with older version as well.
Reporter: Terje Marthinussen
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1850.txt


 {code}
 create table aa ( test STRING )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   WITH SERDEPROPERTIES (input.regex = [^\\](.*), output.format.string = 
 $1s);
 {code}
 This will fail. Great!
 {code}
 create table aa ( test STRING )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   WITH SERDEPROPERTIES (input.regex = (.*), output.format.string = 
 $1s);
 {code}
 Works, no problem there.
 {code}
 alter table aa set serdeproperties (input.regex = [^\\](.*), 
 output.format.string = $1s);
 {code}
 Wups... I can set that without any problems!
 {code}
 alter table aa set serdeproperties (input.regex = (.*), 
 output.format.string = $1s);
 FAILED: Hive Internal Error: java.util.regex.PatternSyntaxException(Unclosed 
 character class near index 7
 [^\](.*)
^)
 java.util.regex.PatternSyntaxException: Unclosed character class near index 7
 [^\](.*)
^
   at java.util.regex.Pattern.error(Pattern.java:1713)
   at java.util.regex.Pattern.clazz(Pattern.java:2254)
   at java.util.regex.Pattern.sequence(Pattern.java:1818)
   at java.util.regex.Pattern.expr(Pattern.java:1752)
   at java.util.regex.Pattern.compile(Pattern.java:1460)
   at java.util.regex.Pattern.init(Pattern.java:1133)
   at java.util.regex.Pattern.compile(Pattern.java:847)
   at 
 org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:101)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:199)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:484)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:161)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:803)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerdeProps(DDLSemanticAnalyzer.java:558)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:232)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:686)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:142)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:370)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {code}
 After this, all further commands on the table fails, including drop table :)
 1. The alter table command should probably check the regexp just like the 
 create table command does
 2. Even though the regexp is bad, it should be possible to do things like set 
 the regexp again or drop the table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1850) alter table set serdeproperties bypasses regexps checks (leaves table in a non-recoverable state?)

2011-07-26 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1850:
--

Fix Version/s: 0.8.0
   Status: Patch Available  (was: Open)

 alter table set serdeproperties bypasses regexps checks (leaves table in a 
 non-recoverable state?)
 --

 Key: HIVE-1850
 URL: https://issues.apache.org/jira/browse/HIVE-1850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Trunk build from a few days ago, but seen once before 
 with older version as well.
Reporter: Terje Marthinussen
Assignee: Amareshwari Sriramadasu
 Fix For: 0.8.0

 Attachments: patch-1850.txt


 {code}
 create table aa ( test STRING )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   WITH SERDEPROPERTIES (input.regex = [^\\](.*), output.format.string = 
 $1s);
 {code}
 This will fail. Great!
 {code}
 create table aa ( test STRING )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
   WITH SERDEPROPERTIES (input.regex = (.*), output.format.string = 
 $1s);
 {code}
 Works, no problem there.
 {code}
 alter table aa set serdeproperties (input.regex = [^\\](.*), 
 output.format.string = $1s);
 {code}
 Wups... I can set that without any problems!
 {code}
 alter table aa set serdeproperties (input.regex = (.*), 
 output.format.string = $1s);
 FAILED: Hive Internal Error: java.util.regex.PatternSyntaxException(Unclosed 
 character class near index 7
 [^\](.*)
^)
 java.util.regex.PatternSyntaxException: Unclosed character class near index 7
 [^\](.*)
^
   at java.util.regex.Pattern.error(Pattern.java:1713)
   at java.util.regex.Pattern.clazz(Pattern.java:2254)
   at java.util.regex.Pattern.sequence(Pattern.java:1818)
   at java.util.regex.Pattern.expr(Pattern.java:1752)
   at java.util.regex.Pattern.compile(Pattern.java:1460)
   at java.util.regex.Pattern.init(Pattern.java:1133)
   at java.util.regex.Pattern.compile(Pattern.java:847)
   at 
 org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:101)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:199)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:484)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:161)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:803)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerdeProps(DDLSemanticAnalyzer.java:558)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:232)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:686)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:142)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:370)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {code}
 After this, all further commands on the table fails, including drop table :)
 1. The alter table command should probably check the regexp just like the 
 create table command does
 2. Even though the regexp is bad, it should be possible to do things like set 
 the regexp again or drop the table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071130#comment-13071130
 ] 

jirapos...@reviews.apache.org commented on HIVE-1694:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-07-26 Thread Prajakta Kalmegh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prajakta Kalmegh updated HIVE-1694:
---

Attachment: HIVE-1694.4.patch

Review Changes done after last review. Added new functionality (See post for 
more details)

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-07-26 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071135#comment-13071135
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John

Please find attached the latest patch (HIVE-1694.4.patch):
The patch contains:
1. Support for multiple aggregates in index creation using the 
AggregateIndexHandler. The column names for the index schema are constructed 
dynamically depending on the aggregates. 
For 'aggregateFunction(columnName)', the column name in index will be 
`_aggregateFunction_of_columnName`. 
For example, for count(l_shipdate), the column name will be 
`_count_of_l_shipdate)`.
For 'count(*)' function, the column name will be `_count_of_all`.

2. Fixed the bug for duplicates in Group-by removal cases. We are not removing 
group-by in any case now. This has made the logic for query rewrites quite 
simpler than before. 
We removed 4 classes (RewriteIndexSubqueryCtx.java, 
RewriteIndexSubqueryProcFactory.java, RewriteRemoveGroupbyCtx.java, 
RewriteRemoveGroupbyProcFactory.java) from the previous patch  and added two 
new simpler classes instead (RewriteQueryUsingAggregateIndex.java, 
RewriteQueryUsingAggregateIndexCtx.java). 

3. Added a new query (with 'UNION ALL') in the same ql_rewrite_gbtoidx.q file 
to demonstrate your requirement in last post. Please  note that the query is 
not a valid real-work use case scenario; but still suffices our purpose to see 
that one branch rewrite does not corrupt the other branch.

4. Rewrite Optimization now happens after the PredicatePushdown, 
PartitionPruner and PartitionConditionRemover.

This patch does not contain:
1. Optimization for cases with mulitple aggregates in selection
2. Optimization for any other aggregate function apart from count
3. Optimization for queries involving multiple tables (even if they are in a 
different branch). Since we are not optimizing for case of joins, the 
constraint also filters out queries which have different tables in union 
queries.
4. Optimizations for index with multiple columns in its key

Here is the review board link for the patch 
https://reviews.apache.org/r/1194/.

Please let me know if you have any questions.


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-07-26 Thread Prajakta Kalmegh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-1694 started by Prajakta Kalmegh.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes

2011-07-26 Thread Prajakta Kalmegh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/
---

Review request for hive and John Sichi.


Summary
---

This patch has defined a new AggregateIndexHandler which is used to optimize 
the query plan for groupby queries. 


This addresses bug HIVE-1694.
https://issues.apache.org/jira/browse/HIVE-1694


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
  ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1194/diff


Testing
---


Thanks,

Prajakta



[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-07-26 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071291#comment-13071291
 ] 

Paul Yang commented on HIVE-2226:
-

Committed. Thanks Sohan!

 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-07-26 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-2226:


   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2272) add TIMESTAMP data type

2011-07-26 Thread Franklin Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklin Hu updated HIVE-2272:
--

Attachment: hive-2272.6.patch

rebase 

 add TIMESTAMP data type
 ---

 Key: HIVE-2272
 URL: https://issues.apache.org/jira/browse/HIVE-2272
 Project: Hive
  Issue Type: New Feature
Reporter: Franklin Hu
Assignee: Franklin Hu
 Attachments: hive-2272.1.patch, hive-2272.2.patch, hive-2272.3.patch, 
 hive-2272.4.patch, hive-2272.5.patch, hive-2272.6.patch


 Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
 using both LazyBinary and LazySimple SerDes. 
 For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
 parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HIVE-2286: ClassCastException when building index with security.authorization turned on

2011-07-26 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/#review1188
---



ql/src/java/org/apache/hadoop/hive/ql/Driver.java
https://reviews.apache.org/r/1137/#comment2597

java.util.Stack is deprecated since it adds unnecessary synchronization.  
We don't have a replacement yet (HIVE-1626) so we've just been using ArrayList.

Also, instead of typecasting to/from Object, use a static inner class for 
holding the record of state variables.


- John


On 2011-07-25 23:03:22, Syed Albiz wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1137/
 ---
 
 (Updated 2011-07-25 23:03:22)
 
 
 Review request for hive, John Sichi and Ning Zhang.
 
 
 Summary
 ---
 
 Save the original HiveOperation/commandType when we generate the index 
 builder task and restore it after we're done generating the task so that the 
 authorization checks make the right decision when deciding what to do.
 
 
 This addresses bug HIVE-2286.
 https://issues.apache.org/jira/browse/HIVE-2286
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
   ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
   ql/src/test/results/clientnegative/index_compact_entry_limit.q.out fcb2673 
   ql/src/test/results/clientnegative/index_compact_size_limit.q.out fcb2673 
   ql/src/test/results/clientpositive/index_auth.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/index_auto.q.out 8d65f98 
   ql/src/test/results/clientpositive/index_auto_file_format.q.out 194b35e 
   ql/src/test/results/clientpositive/index_auto_multiple.q.out 6b81fc3 
   ql/src/test/results/clientpositive/index_auto_partitioned.q.out b0635db 
   ql/src/test/results/clientpositive/index_auto_unused.q.out 3631bbc 
   ql/src/test/results/clientpositive/index_bitmap.q.out 8f41ce3 
   ql/src/test/results/clientpositive/index_bitmap1.q.out 9f638f5 
   ql/src/test/results/clientpositive/index_bitmap2.q.out e901477 
   ql/src/test/results/clientpositive/index_bitmap3.q.out 116c973 
   ql/src/test/results/clientpositive/index_bitmap_auto.q.out cc9d91e 
   ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
 9003eb4 
   ql/src/test/results/clientpositive/index_bitmap_rc.q.out 9bd3c98 
   ql/src/test/results/clientpositive/index_compact.q.out c339ec9 
   ql/src/test/results/clientpositive/index_compact_1.q.out 34ba3ca 
   ql/src/test/results/clientpositive/index_compact_2.q.out e8ce238 
   ql/src/test/results/clientpositive/index_compact_3.q.out d39556d 
   ql/src/test/results/clientpositive/index_creation.q.out 532f07e 
 
 Diff: https://reviews.apache.org/r/1137/diff
 
 
 Testing
 ---
 
 Added new testcase to TestCliDriver: index_auth.q
 
 
 Thanks,
 
 Syed
 




[jira] [Commented] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071315#comment-13071315
 ] 

jirapos...@reviews.apache.org commented on HIVE-2286:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/#review1188
---



ql/src/java/org/apache/hadoop/hive/ql/Driver.java
https://reviews.apache.org/r/1137/#comment2597

java.util.Stack is deprecated since it adds unnecessary synchronization.  
We don't have a replacement yet (HIVE-1626) so we've just been using ArrayList.

Also, instead of typecasting to/from Object, use a static inner class for 
holding the record of state variables.


- John


On 2011-07-25 23:03:22, Syed Albiz wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1137/
bq.  ---
bq.  
bq.  (Updated 2011-07-25 23:03:22)
bq.  
bq.  
bq.  Review request for hive, John Sichi and Ning Zhang.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Save the original HiveOperation/commandType when we generate the index 
builder task and restore it after we're done generating the task so that the 
authorization checks make the right decision when deciding what to do.
bq.  
bq.  
bq.  This addresses bug HIVE-2286.
bq.  https://issues.apache.org/jira/browse/HIVE-2286
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
bq.ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
bq.ql/src/test/results/clientnegative/index_compact_entry_limit.q.out 
fcb2673 
bq.ql/src/test/results/clientnegative/index_compact_size_limit.q.out 
fcb2673 
bq.ql/src/test/results/clientpositive/index_auth.q.out PRE-CREATION 
bq.ql/src/test/results/clientpositive/index_auto.q.out 8d65f98 
bq.ql/src/test/results/clientpositive/index_auto_file_format.q.out 194b35e 
bq.ql/src/test/results/clientpositive/index_auto_multiple.q.out 6b81fc3 
bq.ql/src/test/results/clientpositive/index_auto_partitioned.q.out b0635db 
bq.ql/src/test/results/clientpositive/index_auto_unused.q.out 3631bbc 
bq.ql/src/test/results/clientpositive/index_bitmap.q.out 8f41ce3 
bq.ql/src/test/results/clientpositive/index_bitmap1.q.out 9f638f5 
bq.ql/src/test/results/clientpositive/index_bitmap2.q.out e901477 
bq.ql/src/test/results/clientpositive/index_bitmap3.q.out 116c973 
bq.ql/src/test/results/clientpositive/index_bitmap_auto.q.out cc9d91e 
bq.ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 
9003eb4 
bq.ql/src/test/results/clientpositive/index_bitmap_rc.q.out 9bd3c98 
bq.ql/src/test/results/clientpositive/index_compact.q.out c339ec9 
bq.ql/src/test/results/clientpositive/index_compact_1.q.out 34ba3ca 
bq.ql/src/test/results/clientpositive/index_compact_2.q.out e8ce238 
bq.ql/src/test/results/clientpositive/index_compact_3.q.out d39556d 
bq.ql/src/test/results/clientpositive/index_creation.q.out 532f07e 
bq.  
bq.  Diff: https://reviews.apache.org/r/1137/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Added new testcase to TestCliDriver: index_auth.q
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Syed
bq.  
bq.



 ClassCastException when building index with security.authorization turned on
 

 Key: HIVE-2286
 URL: https://issues.apache.org/jira/browse/HIVE-2286
 Project: Hive
  Issue Type: Bug
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch


 When trying to build an index with authorization checks turned on, hive 
 issues the following ClassCastException:
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
  at
 org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848)
  at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293)
  at
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385)
  at
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392)
  at
 

[jira] [Commented] (HIVE-2020) Create a separate namespace for Hive variables

2011-07-26 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071318#comment-13071318
 ] 

Vaibhav Aggarwal commented on HIVE-2020:


I propose to use -d, --define to define Hive variables.
Amazon ElasticMapreduce is already using this notation for hive variables and 
variable substitution.

This approach would also clearly separate use of -hiveconf from -d or --define 
which would be used to purely set hive variables.

This would also maintain consistency for Hive users.

 Create a separate namespace for Hive variables
 --

 Key: HIVE-2020
 URL: https://issues.apache.org/jira/browse/HIVE-2020
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Carl Steinbach

 Support for variable substitution was added in HIVE-1096. However, variable 
 substitution was implemented by reusing the HiveConf namespace, so there is 
 no separation between Hive configuration properties and Hive variables.
 This ticket encompasses the following enhancements:
 * Create a separate namespace for managing Hive variables.
 * Add support for setting variables on the command line via '-hivevar x=y'
 * Add support for setting variables through the CLI via 'var x=y'
 * Add support for referencing variables in statements using either 
 '${hivevar:var_name}' or '${var_name}'
 * Provide a means for differentiating between hiveconf, hivevar, system, and 
 environment properties in the output of 'set -v'

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2305) UNION ALL on different types throws runtime exception

2011-07-26 Thread Franklin Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklin Hu updated HIVE-2305:
--

Attachment: hive-2305.2.patch

fix upstream input file change propagation

 UNION ALL on different types throws runtime exception
 -

 Key: HIVE-2305
 URL: https://issues.apache.org/jira/browse/HIVE-2305
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Franklin Hu
Assignee: Franklin Hu
 Attachments: hive-2305.1.patch, hive-2305.2.patch


 Ex:
 SELECT * (SELECT 123 FROM ... UNION ALL SELECT '123' FROM ..) t;
 Unioning columns of different types currently throws runtime exceptions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-1143) CREATE VIEW followup: updatable views

2011-07-26 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1143:


Assignee: Charles Chen  (was: Carl Steinbach)

 CREATE VIEW followup:  updatable views
 --

 Key: HIVE-1143
 URL: https://issues.apache.org/jira/browse/HIVE-1143
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen

 For HIVE-972, we only implemented read-only views.  Updatable views are 
 difficult in general, but for simple cases where views are being used to 
 impose a rename layer on existing tables/columns, update support would be 
 high value (for consistent read/write access) and not a lot of work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-1989) recognize transitivity of predicates on join keys

2011-07-26 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1989:


Assignee: Charles Chen

 recognize transitivity of predicates on join keys
 -

 Key: HIVE-1989
 URL: https://issues.apache.org/jira/browse/HIVE-1989
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen

 Given
 {noformat}
 set hive.mapred.mode=strict;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 select count(*) from invites join invites2 on invites.ds=invites2.ds and 
 invites.ds='2011-01-01';
 {noformat}
 currently an error occurs:
 {noformat}
 Error in semantic analysis: No Partition Predicate Found for Alias invites2 
 Table invites2
 {noformat}
 The optimizer should be able to infer a predicate on invites2 via 
 transitivity.  The current lack places a burden on the user to add a 
 redundant predicate, and makes impossible (at least in strict mode) join 
 views where both underlying tables are partitioned (the join select list has 
 to pick one of the tables arbitrarily).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2123) CommandNeedRetryException needs release locks

2011-07-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071327#comment-13071327
 ] 

John Sichi commented on HIVE-2123:
--

This one has been sitting in Patch Available queue for a while...anything 
holding it up?


 CommandNeedRetryException needs release locks
 -

 Key: HIVE-2123
 URL: https://issues.apache.org/jira/browse/HIVE-2123
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2123.1.patch, HIVE-2123.2.patch, HIVE-2123.3.patch, 
 HIVE-2123.4.patch


 now when CommandNeedRetryException is thrown, locks are not released. Not 
 sure whether it will cause problem, since the same locks will be acquired 
 when retrying it. It is anyway something we need to fix. Also we can do some 
 little code cleaning up to make future mistakes less likely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions

2011-07-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071328#comment-13071328
 ] 

John Sichi commented on HIVE-2242:
--

This one has been sitting in Patch Available queue for a while...anything 
holding it up?


 DDL Semantic Analyzer does not pass partial specification partitions to 
 PreExecute hooks when dropping partitions
 -

 Key: HIVE-2242
 URL: https://issues.apache.org/jira/browse/HIVE-2242
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2242.1.patch


 Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
 partitions that have a full specification to Pre Execution hooks.  It should 
 also include all matches from partial specifications.
 E.g., suppose you have a table
 {{create table test_table (a string) partitioned by (p1 string, p2 string);}}
 {{alter table test_table add partition (p1=1, p2=1);}}
 {{alter table test_table add partition (p1=1, p2=2);}}
 {{alter table test_table add partition (p1=2, p2=2);}}
 and you run 
 {{alter table test_table drop partition(p1=1);}}
 Pre-execution hooks will not be passed any of the partitions.  The expected 
 behavior is for pre-execution hooks to get the WriteEntity's with the 
 partitions p1=1/p2=1 and p1=1/p2=2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2065) RCFile issues

2011-07-26 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071330#comment-13071330
 ] 

John Sichi commented on HIVE-2065:
--

This one has been sitting in Patch Available queue for a while...are there 
issues that still need to be resolved?



 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt, 
 Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HIVE-2272: add TIMESTAMP data type

2011-07-26 Thread Franklin Hu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1135/
---

(Updated 2011-07-26 21:11:35.218104)


Review request for hive.


Changes
---

Rebase


Summary
---

Adds TIMESTAMP type to serde2 with both string (LazySimple) and binary 
(LazyBinary) serialization.
Supports SQL style jdbc timestamps of the format with nanosecond precision
-MM-DD HH:MM:SS[.fff...]


This addresses bug HIVE-2272.
https://issues.apache.org/jira/browse/HIVE-2272


Diffs (updated)
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDate.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateAdd.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFWeekOfYear.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFYear.java 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStdSample.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
 PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/invalid_t_create3.q 1151189 
  trunk/ql/src/test/queries/clientpositive/timestamp_1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_comparison.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_udf.q PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/invalid_create_tbl1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_transform.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/wrong_column_type.q.out 1151189 
  trunk/ql/src/test/results/clientpositive/show_functions.q.out 1151189 
  trunk/ql/src/test/results/clientpositive/timestamp_1.q.out PRE-CREATION 
  

[jira] [Commented] (HIVE-2272) add TIMESTAMP data type

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071368#comment-13071368
 ] 

jirapos...@reviews.apache.org commented on HIVE-2272:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1135/
---

(Updated 2011-07-26 21:11:35.218104)


Review request for hive.


Changes
---

Rebase


Summary
---

Adds TIMESTAMP type to serde2 with both string (LazySimple) and binary 
(LazyBinary) serialization.
Supports SQL style jdbc timestamps of the format with nanosecond precision
-MM-DD HH:MM:SS[.fff...]


This addresses bug HIVE-2272.
https://issues.apache.org/jira/browse/HIVE-2272


Diffs (updated)
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDate.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateAdd.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDayOfMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMonth.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFWeekOfYear.java 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFYear.java 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStdSample.java
 1151189 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java
 1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java 
1151189 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
 PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/invalid_t_create3.q 1151189 
  trunk/ql/src/test/queries/clientpositive/timestamp_1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_comparison.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/timestamp_udf.q PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/invalid_create_tbl1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_alter2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create1.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_create2.q.out 1151189 
  trunk/ql/src/test/results/clientnegative/invalid_t_transform.q.out 

[jira] [Created] (HIVE-2308) Throw an error if user specifies unsupported FS in LOCATION clause of CREATE TABLE

2011-07-26 Thread Carl Steinbach (JIRA)
Throw an error if user specifies unsupported FS in LOCATION clause of CREATE 
TABLE
--

 Key: HIVE-2308
 URL: https://issues.apache.org/jira/browse/HIVE-2308
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Carl Steinbach




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-2309:


Attachment: HIVE-2309.1.patch

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071409#comment-13071409
 ] 

Siying Dong commented on HIVE-2309:
---

can we limit number of digits for the attempt ID?

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2231) Column aliases

2011-07-26 Thread Adam Kramer (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071412#comment-13071412
 ] 

Adam Kramer commented on HIVE-2231:
---

The use case here is basically providing backwards compatibility. So for many 
users of a table, and many new users of a table, they are using the same table 
and want to refer to it as such; it is the canonical table.

But sometimes the table was originally named with crummy names, and it'd be 
better and cleaner to document and train new people on the appropriate names.

Views eat up the namespace and provide a level of misdirection that is not 
always desirable, but here are the two biggest limitations of views:
* SELECT * is not fast. I can't SELECT * on a view and get data immediately in 
the same way that I would upon writing the same query. This is true even when 
the schema are exactly the same.
* Partitions are not see-through. I can't use show partitions on a view or 
write any automated system based on the view to identify when new partitions 
land, which forces reference to the original table, and then all is lost.



 Column aliases
 --

 Key: HIVE-2231
 URL: https://issues.apache.org/jira/browse/HIVE-2231
 Project: Hive
  Issue Type: Wish
  Components: Query Processor
Reporter: Adam Kramer
Priority: Trivial

 It would be nice in several cases to be able to alias column names.
 Say someone in your company CREATEd a TABLE called important_but_named_poorly 
 (alvin BIGINT, theodore BIGINT, simon STRING) PARTITIONED BY (dave STRING), 
 that indexes the relationship between an actor (alvin), a target (theodore), 
 and the interaction between them (simon), partitioned based on the date 
 string (dave). Renaming the columns would break a million pipelines that are 
 important but ownerless.
 It would be awesome to define an aliasing system as such:
 ALTER TABLE important_but_named_poorly REPLACE COLUMNS (actor BIGINT AKA 
 alvin, target BIGINT AKA theodore, ixn STRING AKA simon) PARTITIONED BY (ds 
 STRING AKA dave);
 ...which would mean that any user could, e.g., use the term dave to refer 
 to ds if they really wanted to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1955) Support non-constant expressions for array indexes.

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-1955:
--

Description: 
FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array 
Indexes not Supported dut

...just wrote my own UDF to do this, and it is trivial. We should support this 
natively.

Let foo have these rows:
arr   i
[1,2,3]   1
[3,4,5]   2
[5,4,3]   2
[0,0,1]   0

Then,
SELECT arr[i] FROM foo
should return:
2
5
3
1

Similarly, for the same table,
SELECT 3 IN arr FROM foo
should return:
true
true
true
false

...these use cases are needless limitations of functionality. We shouldn't need 
UDFs to accomplish these goals.

  was:
FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array 
Indexes not Supported dut

...just wrote my own UDF to do this, and it is trivial. We should support this 
natively.


 Support non-constant expressions for array indexes.
 ---

 Key: HIVE-1955
 URL: https://issues.apache.org/jira/browse/HIVE-1955
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for 
 Array Indexes not Supported dut
 ...just wrote my own UDF to do this, and it is trivial. We should support 
 this natively.
 Let foo have these rows:
 arr   i
 [1,2,3]   1
 [3,4,5]   2
 [5,4,3]   2
 [0,0,1]   0
 Then,
 SELECT arr[i] FROM foo
 should return:
 2
 5
 3
 1
 Similarly, for the same table,
 SELECT 3 IN arr FROM foo
 should return:
 true
 true
 true
 false
 ...these use cases are needless limitations of functionality. We shouldn't 
 need UDFs to accomplish these goals.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-1466:
--

Description: 
NULL values are passed to transformers as a literal backslash and a literal N. 
NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This 
is inconsistent.

The ROW FORMAT specification of tables should be able to specify the manner in 
which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
'\003' or whatever should apply to all instances of table export and saving.

  was:
I just updated the Hive wiki to clarify what some would consider an oddity: 
When NULL values are exported to a script via TRANSFORM, they are converted to 
the string \N, and then when the script's output is read, any cell that 
contains only \N is treated as a NULL value.

I believe that there are very VERY few reasons why anyone would need cells that 
contain only a backslash and then a capital N to be distinguished from NULL 
cells, but for complete generality, we should allow this.

The way to do that is probably by adding a specification in the ROW FORMAT for 
a table that would allow any string to be treated as a NULL if it is the only 
string in a cell. Some may prefer the empty string, others the word NULL in 
caps, etc. I vote for keeping \N as the default because I am used to it, but 
also for allowing this to be customized.


 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2311) TRANSFORM statements should come with their own ROW FORMATs.

2011-07-26 Thread Adam Kramer (JIRA)
TRANSFORM statements should come with their own ROW FORMATs.


 Key: HIVE-2311
 URL: https://issues.apache.org/jira/browse/HIVE-2311
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Adam Kramer


Sometimes Hive tables contain tabs and/or other characters that could easily be 
misinterpreted by a transformer as a delimiter. This can break many TRANSFORM 
queries.

The solution is to have a ROW FORMAT semantics that can be attached to an 
individual TRANSFORM instance. It would have the same semantics as table 
creation, but during serialization it would ensure that any formal delimiter 
characters that did not indicate an actual break between columns would be 
escaped.

At the very least, it is a bug that TRANSFORM statement deserialization does 
not backslash out literal tabs in the current implementation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2311) TRANSFORM statements should come with their own ROW FORMATs.

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-2311:
--

  Priority: Minor  (was: Major)
Issue Type: Bug  (was: Improvement)

 TRANSFORM statements should come with their own ROW FORMATs.
 

 Key: HIVE-2311
 URL: https://issues.apache.org/jira/browse/HIVE-2311
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Adam Kramer
Priority: Minor

 Sometimes Hive tables contain tabs and/or other characters that could easily 
 be misinterpreted by a transformer as a delimiter. This can break many 
 TRANSFORM queries.
 The solution is to have a ROW FORMAT semantics that can be attached to an 
 individual TRANSFORM instance. It would have the same semantics as table 
 creation, but during serialization it would ensure that any formal delimiter 
 characters that did not indicate an actual break between columns would be 
 escaped.
 At the very least, it is a bug that TRANSFORM statement deserialization does 
 not backslash out literal tabs in the current implementation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-2309:


Attachment: HIVE-2309.2.patch

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2312) Make CLI variables available to UDFs

2011-07-26 Thread Adam Kramer (JIRA)
Make CLI variables available to UDFs


 Key: HIVE-2312
 URL: https://issues.apache.org/jira/browse/HIVE-2312
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients, UDF
Reporter: Adam Kramer


Straightforward use case: My UDFs should be able to condition on whether 
hive.mapred.mode=strict or nonstrict.

But these things could also be useful for certain optimizations. For example, a 
UDAF knowing that there is only one reduce phase could avoid a lot of pushing 
data around unnecessarily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071420#comment-13071420
 ] 

Siying Dong commented on HIVE-2309:
---

+1, will commit after tests pass

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 
  'attempt_201107090429_6496​5_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-89) avg() min() max() will get error message

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-89:
---

Fix Version/s: 0.3.0

 avg() min() max() will get error message
 

 Key: HIVE-89
 URL: https://issues.apache.org/jira/browse/HIVE-89
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: hadoop 0.17.2.1 hive 0.17.0
Reporter: YihueyChyi
Assignee: Zheng Shao
 Fix For: 0.3.0


 When I run select min() , max() or avg() ,I will get error message
 Test table : data rows: 15835023
 error message: FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.ExecDriver
 Hadoop web:50030 message
 From reduce process
 java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.reflect.InvocationTargetException
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:173)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.reflect.InvocationTargetException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:243)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:168)
   ... 2 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:210)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:297)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:240)
   ... 3 more
 Caused by: java.lang.NumberFormatException: For input string: 2004-12-22
   at 
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
   at java.lang.Double.parseDouble(Double.java:510)
   at org.apache.hadoop.hive.ql.udf.UDAFAvg.aggregate(UDAFAvg.java:42)
   ... 10 more
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1251) TRANSFORM should allow piping or allow cross-subquery assumptions.

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-1251:
--

Description: 
Many traditional transforms can be accomplished via simple unix commands 
chained together. For example, the sort phase is an instance of cut -f 1 | 
sort. However, the TRANSFORM command in Hive doesn't allow for unix-style 
piping to occur.

One classic case where I wish there was piping is when I want to stack a 
column into several rows:

SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python 
reducer.py' AS key, value

...in this case, stacker.py would produce output of this form:
key col0
key col1
key col2
...and then the reducer would reduce the above down to one item per key. In 
this case, the current workaround is this:

SELECT TRANSFORM(a.key, a.col) USING 'python reducer.py' AS key, value FROM
(SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py' AS key, 
col FROM table)

...the problem here is that for the above to work (and it should, indeed, work 
in a map-only MR task), I must assume that the data output from one subquery 
will be passed in EXACTLY THE SAME FORMAT to the outer query--i.e., I must 
assume that Hive will not cut a map or reduce phase in between, or fan out 
data from the inner query into different mappers in the outer query.

As a user, *I should not be allowed to assume* that data coming out of a 
subquery goes into the nodes for a superquery in the same order...ESPECIALLY in 
the map phase.

  was:
Many traditional transforms can be accomplished via simple unix commands 
chained together. For example, the sort phase is an instance of cut -f 1 | 
sort. However, the TRANSFORM command in Hive doesn't allow for unix-style 
piping to occur.

One classic case where I wish there was piping is when I want to stack a 
column into several rows:

SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python 
reducer.py' AS key, value

...in this case, stacker.py would produce output of this form:
key col0
key col1
key col2
...and then the reducer would reduce the above down to one item per key. In 
this case, the current workaround is this:

SELECT TRANSFORM(a.key, a.col) USING 'python reducer.py' AS key, value FROM
(SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py' AS key, 
col FROM table)

...the problem here is that as a user, *I should not be allowed to assume* that 
the output from the inner query will be passed DIRECTLY to the outer query 
(i.e., the outer query should not assume that it gets the inner query's output 
on the same box and in the same order). I know as a programmer that this works 
fine as a pipe, but when writing Hive code I always wonder--what if Hive 
decides to run the inner query in a reduce step, and the outer query in a 
subsequent map step?

Broadly, my understanding is that the goal of Hive is to abstract the mapreduce 
process away from users. To this end, we have syntax (CLUSTER BY) that allows 
users to assume that a reduce task will occur (but see also 
https://issues.apache.org/jira/browse/HIVE-835 ), but there is no formal way to 
force or syntactically assume that the data will NOT be copied or sorted or 
transformed. I argue that the only case where this would be necessary or 
desirable would be in the instance of a pipe within a transform...ergo a desire 
for | to work as expected.

An alternative would be for the HQL language definition to explicitly state all 
conditions that would cause a task boundary to be crossed (so I can make the 
strong assumption that if none of those conditions obtains, my query will be 
supported in the future)...but that seems potentially restrictive as the 
language and Hadoop evolves.


Summary: TRANSFORM should allow piping or allow cross-subquery 
assumptions.  (was: TRANSFORM should allow pipes in some form)

 TRANSFORM should allow piping or allow cross-subquery assumptions.
 --

 Key: HIVE-1251
 URL: https://issues.apache.org/jira/browse/HIVE-1251
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 Many traditional transforms can be accomplished via simple unix commands 
 chained together. For example, the sort phase is an instance of cut -f 1 | 
 sort. However, the TRANSFORM command in Hive doesn't allow for unix-style 
 piping to occur.
 One classic case where I wish there was piping is when I want to stack a 
 column into several rows:
 SELECT TRANSFORM(key, col0, col1, col2) USING 'python stacker.py | python 
 reducer.py' AS key, value
 ...in this case, stacker.py would produce output of this form:
 key col0
 key col1
 key col2
 ...and then the reducer would reduce the above down to one item per key. In 
 this case, the current workaround is this:
 SELECT TRANSFORM(a.key, a.col) USING 

[jira] [Updated] (HIVE-10) [Hive] filter is executed after the join

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-10:
---

Fix Version/s: 0.3.0

 [Hive] filter is executed after the join
 

 Key: HIVE-10
 URL: https://issues.apache.org/jira/browse/HIVE-10
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.3.0


 Filter is not pushed above the join in Hive currently. This can be pretty 
 expensive if the filter is highly selective.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-39) Hive: we should be able to specify a column without a table/alias name

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-39:
---

Fix Version/s: 0.3.0

 Hive: we should be able to specify a column without a table/alias name
 --

 Key: HIVE-39
 URL: https://issues.apache.org/jira/browse/HIVE-39
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Ashish Thusoo
 Fix For: 0.3.0


 SELECT field1, field2 from table1 should work, just as SELECT 
 table1.field1, table1.field2 from table1 
 For join, the situation will be a bit more complicated.  If the 2 join 
 operands have columns of the same name, then we should output an ambiguity 
 error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-58) [hive] join condition does not allow a simple filter

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-58:
---

Fix Version/s: 0.3.0

 [hive] join condition does not allow a simple filter
 

 Key: HIVE-58
 URL: https://issues.apache.org/jira/browse/HIVE-58
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.3.0


 In the join condition, a simple filter condition cannot be specified.
 For example,
   select  from A join B ON (A.a = B.b and A.x = 10);
 is not supported.  This can be very useful specially in case of outer joins.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-26) [Hive] uppercase alias with a join not working

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-26:
---

Fix Version/s: 0.3.0

 [Hive] uppercase alias with a join not working
 --

 Key: HIVE-26
 URL: https://issues.apache.org/jira/browse/HIVE-26
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.3.0


 EXPLAIN FROM 
 (SELECT src.* FROM src) x
 JOIN 
 (SELECT src.* FROM src) Y
 ON (x.key = Y.key)
 SELECT Y.*;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-836) Add syntax to force a new mapreduce job / transform subquery in mapper

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-836:
-

Description: 
Hive currently does a lot of awesome work to figure out when my transformers 
should be used in the mapper and when they should be used in the reducer. 
However, sometimes I have a different plan.

For example, consider this:

{code:title=foo.sql}
SELECT TRANSFORM(a.val1, a.val2)
USING './niftyscript'
AS part1, part2, part3
FROM (
SELECT b.val AS val1, c.val AS val2
FROM tblb b JOIN tblc c on (b.key=c.key)
) a
{code}

...now, assume that the join step is very easy and 'niftyscript' is really 
processor intensive. The ideal format for this is a MR task with few mappers 
and few reducers, and then a second MR task with lots of mappers.

Currently, there is no way to even require the outer TRANSFORM statement occur 
in a separate map phase. Implementing a hint such as /* +MAP */, akin to /* 
+MAPJOIN(x) */, would be awesome.

Current workaround is to dump everything to a temporary table and then start 
over, but that is not an easy to scale--the subquery structure effectively (and 
easily) locks the mid-points so no other job can touch the table.

  was:
Hive currently does a lot of awesome work to figure out when my transformers 
should be used in the mapper and when they should be used in the reducer. 
However, sometimes I have a different plan.

For example, consider this:

SELECT TRANSFORM(a.val1, a.val2)
USING './niftyscript'
AS part1, part2, part3
FROM (
SELECT b.val AS val1, c.val AS val2
FROM tblb b JOIN tblc c on (b.key=c.key)
) a

...in this syntax b and c will be joined (in the reducer, of course), and then 
the rows that pass the join clause will be passed to niftyscript _in the 
reducer._ However, when niftyscript is high-computation and there is a lot of 
data coming out of the join but very few reducers, there's a huge hold-up. It 
would be awesome if I could somehow force a new mapreduce step after the 
subquery, so that ./niftyscript is run in the mappers rather than the prior 
step's reducers.

Current workaround is to dump everything to a temporary table and then start 
over, but that is not an easy to scale--the subquery structure effectively (and 
easily) locks the mid-points so no other job can touch the table.

SUGGESTED FIX: Either cause MAP and REDUCE to force map/reduce steps (c.f. 
https://issues.apache.org/jira/browse/HIVE-835 ), or add a query element to 
specify that the job ends here. For example, in the above query, FROM a 
SELF-CONTAINED or PRECOMPUTE a or START JOB AFTER a or something like that.



 Add syntax to force a new mapreduce job / transform subquery in mapper
 --

 Key: HIVE-836
 URL: https://issues.apache.org/jira/browse/HIVE-836
 Project: Hive
  Issue Type: Wish
Reporter: Adam Kramer

 Hive currently does a lot of awesome work to figure out when my transformers 
 should be used in the mapper and when they should be used in the reducer. 
 However, sometimes I have a different plan.
 For example, consider this:
 {code:title=foo.sql}
 SELECT TRANSFORM(a.val1, a.val2)
 USING './niftyscript'
 AS part1, part2, part3
 FROM (
 SELECT b.val AS val1, c.val AS val2
 FROM tblb b JOIN tblc c on (b.key=c.key)
 ) a
 {code}
 ...now, assume that the join step is very easy and 'niftyscript' is really 
 processor intensive. The ideal format for this is a MR task with few mappers 
 and few reducers, and then a second MR task with lots of mappers.
 Currently, there is no way to even require the outer TRANSFORM statement 
 occur in a separate map phase. Implementing a hint such as /* +MAP */, akin 
 to /* +MAPJOIN(x) */, would be awesome.
 Current workaround is to dump everything to a temporary table and then start 
 over, but that is not an easy to scale--the subquery structure effectively 
 (and easily) locks the mid-points so no other job can touch the table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-141) drop table partition behaving oddly - does not create subdirectories

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-141:


Fix Version/s: 0.3.0

 drop table partition behaving oddly - does not create subdirectories
 

 Key: HIVE-141
 URL: https://issues.apache.org/jira/browse/HIVE-141
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hao Liu
Assignee: Prasad Chakka
Priority: Critical
 Fix For: 0.3.0

   Original Estimate: 4h
  Remaining Estimate: 4h

 for example, I have a table, which has two partitions:
 tmp_table_name/dt=2008-11-01
 tmp_table_name/dt=2008-11-02
 When we use hive metastore to drop the first partition (as root), I expect 
 the data file will be moved to 
 user/root/.Trash/081103/tmp_table_name/dt=2008-11-01 by default. However, 
 directory tmp_table_name was not created, the data was moved to 
 user/root/.Trash/081103/dt=2008-11-01, which makes data recovery a very 
 difficult task.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-66) Insert into a dynamic serde table from a MetadataTypedColumnSetSerDe

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-66:
---

Fix Version/s: 0.3.0

 Insert into a dynamic serde table from a MetadataTypedColumnSetSerDe
 

 Key: HIVE-66
 URL: https://issues.apache.org/jira/browse/HIVE-66
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
Priority: Critical
 Fix For: 0.3.0


 Fails with column mismatch error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-106) Join operation fails for some queries

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-106:


Fix Version/s: 0.8.0

 Join operation fails for some queries
 -

 Key: HIVE-106
 URL: https://issues.apache.org/jira/browse/HIVE-106
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Josh Ferguson
Assignee: Namit Jain
Priority: Critical
 Fix For: 0.8.0


 The Tables Are
 CREATE TABLE activities 
 (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
  FieldSchema(name:actee_id,type:string,comment:null), 
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
  
 actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 CREATE TABLE users 
 (id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
  
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 A working query is
 SELECT activities.* FROM activities WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 A non working query is
 SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON 
 activities.actor_id = users.id WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 The Exception Is
 java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index 
 expression on string
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
   at 

[jira] [Updated] (HIVE-145) Hive wiki provides incorrect download and setup instructions

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-145:


Fix Version/s: 0.3.0

 Hive wiki provides incorrect download and setup instructions
 

 Key: HIVE-145
 URL: https://issues.apache.org/jira/browse/HIVE-145
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Aaron Kimball
Assignee: Raghotham Murthy
 Fix For: 0.3.0


 The Getting Started instructions at 
 http://wiki.apache.org/hadoop/Hive/GettingStarted are incorrect. They claim 
 that you should download a dist-17.tar.gz file from a Facebook mirror. This 
 link is 404, and Facebook does not seem to maintain a publicly available Hive 
 package at any other location I can find. Thus, the wiki should be updated to 
 instruct users to checkout/export files from SVN. (This page is locked, so I 
 can't change it myself)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-835) Deprecate, remove, or fix MAP and REDUCE syntax.

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-835:
-

Summary: Deprecate, remove, or fix MAP and REDUCE syntax.  (was: Make MAP 
and REDUCE work as expected or add warnings)

 Deprecate, remove, or fix MAP and REDUCE syntax.
 

 Key: HIVE-835
 URL: https://issues.apache.org/jira/browse/HIVE-835
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 There are syntactic elements MAP and REDUCE which function as syntactic sugar 
 for SELECT TRANSFORM. This behavior is not at all intuitive, because no 
 checking or verification is done to ensure that the user's intention is met.
 Specifically, Hive may see a MAP query and simply tack the transform script 
 on to the end of a reduce job (so, the user says MAP but hive does a REDUCE), 
 or (more dangerously) vice-versa. Given that Hive's whole point is to sit on 
 top of a mapreduce framework and allow transformations in the mapper or 
 reducer, it seems very inappropriate for Hive to ignore a clear command from 
 the user to MAP or to REDUCE the data using a script, and then simply ignore 
 it.
 Better behavior would be for hive to see a MAP command and to start a new 
 mapreduce step and run the command in the mapper (even if it otherwise would 
 be run in the reducer), and for REDUCE to begin a reduce step if necessary 
 (so, tack the REDUCE script on to the end of a REDUCE job if the current 
 system would do so, or if not, treat the 0th column as the reduce key, throw 
 a warning saying this has been done, and force a reduce job).
 Acceptable behavior would be to throw an error or warning when the user's 
 clearly-stated desire is going to be ignored. Warning: User used MAP 
 keyword, but transformation will occur in the reduce phase / Warning: User 
 used REDUCE keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. 
 Transformation will occur in the map phase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-211) Add metastore_db to svn ignore

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-211:


Fix Version/s: 0.3.0

 Add metastore_db to svn ignore
 --

 Key: HIVE-211
 URL: https://issues.apache.org/jira/browse/HIVE-211
 Project: Hive
  Issue Type: Task
Reporter: Johan Oskarsson
Assignee: Zheng Shao
Priority: Trivial
 Fix For: 0.3.0


 As per HIVE-101 add the metastore_db directory to svn ignore since it 
 shouldn't be committed or added to any patches.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-835) Deprecate, remove, or fix MAP and REDUCE syntax.

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-835:


Component/s: SQL

 Deprecate, remove, or fix MAP and REDUCE syntax.
 

 Key: HIVE-835
 URL: https://issues.apache.org/jira/browse/HIVE-835
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Reporter: Adam Kramer

 There are syntactic elements MAP and REDUCE which function as syntactic sugar 
 for SELECT TRANSFORM. This behavior is not at all intuitive, because no 
 checking or verification is done to ensure that the user's intention is met.
 Specifically, Hive may see a MAP query and simply tack the transform script 
 on to the end of a reduce job (so, the user says MAP but hive does a REDUCE), 
 or (more dangerously) vice-versa. Given that Hive's whole point is to sit on 
 top of a mapreduce framework and allow transformations in the mapper or 
 reducer, it seems very inappropriate for Hive to ignore a clear command from 
 the user to MAP or to REDUCE the data using a script, and then simply ignore 
 it.
 Better behavior would be for hive to see a MAP command and to start a new 
 mapreduce step and run the command in the mapper (even if it otherwise would 
 be run in the reducer), and for REDUCE to begin a reduce step if necessary 
 (so, tack the REDUCE script on to the end of a REDUCE job if the current 
 system would do so, or if not, treat the 0th column as the reduce key, throw 
 a warning saying this has been done, and force a reduce job).
 Acceptable behavior would be to throw an error or warning when the user's 
 clearly-stated desire is going to be ignored. Warning: User used MAP 
 keyword, but transformation will occur in the reduce phase / Warning: User 
 used REDUCE keyword, but did not specify DISTRIBUTE BY / CLUSTER BY column. 
 Transformation will occur in the map phase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071426#comment-13071426
 ] 

Hudson commented on HIVE-2226:
--

Integrated in Hive-trunk-h0.21 #851 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/851/])
HIVE-2226. Add API to retrieve table names by an arbitrary filter, e.g., by 
owner, retention, parameters, etc. (Sohan Jain via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1151213
Files : 
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Constants.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
* /hive/trunk/metastore/if/hive_metastore.thrift
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
* 
/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
* /hive/trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h
* /hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/constants.py
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php
* /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb


 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-150) group by count(1) will get error

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-150:


Fix Version/s: 0.3.0

 group by count(1) will get error
 

 Key: HIVE-150
 URL: https://issues.apache.org/jira/browse/HIVE-150
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: HADOOP 0.17.2.1 
Reporter: YihueyChyi
 Fix For: 0.3.0

 Attachments: hive-150.1.patch


 HIVEQL:  select l.http_user_agent,count(1) from log_resume_all l  group by 
 l.http_user_agent
 Maybe I'll get error in the second stage:
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.ExecDriver
 The second stage :
 map error 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:151)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:250)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:174)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
 syslog logs
 2008-12-10 15:41:15,209 DEBUG org.apache.hadoop.mapred.TaskTracker: Child 
 starting
 2008-12-10 15:41:15,717 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2008-12-10 15:41:15,805 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 64
 2008-12-10 15:41:16,252 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
 the native-hadoop library
 2008-12-10 15:41:16,253 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: 
 Successfully loaded  initialized native-zlib library
 2008-12-10 15:41:16,424 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 Initializing Self
 2008-12-10 15:41:16,428 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 Adding alias /tmp/hive-root/462573742/46102483.10002 to work list for file 
 /tmp/hive-root/462573742/46102483.10002/0015_r_29_0
 2008-12-10 15:41:16,438 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Got 
 partitions: null
 2008-12-10 15:41:16,438 INFO 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initializing Self
 2008-12-10 15:41:16,443 INFO 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Using tag = -1
 2008-12-10 15:41:16,460 INFO 
 org.apache.hadoop.hive.serde2.thrift.TBinarySortableProtocol: Sort order is 
 2008-12-10 15:41:16,460 INFO 
 org.apache.hadoop.hive.serde2.thrift.TBinarySortableProtocol: Sort order is 
 2008-12-10 15:41:16,489 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0
 2008-12-10 15:41:16,495 WARN org.apache.hadoop.mapred.TaskTracker: Error 
 running child
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:151)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:250)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:174)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:71)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HIVE-2286: ClassCastException when building index with security.authorization turned on

2011-07-26 Thread Syed Albiz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/
---

(Updated 2011-07-26 23:28:13.279889)


Review request for hive, John Sichi and Ning Zhang.


Changes
---

refactor patch to dump query state into an inner class rather than a Stack.


Summary
---

Save the original HiveOperation/commandType when we generate the index builder 
task and restore it after we're done generating the task so that the 
authorization checks make the right decision when deciding what to do.


This addresses bug HIVE-2286.
https://issues.apache.org/jira/browse/HIVE-2286


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
  ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
  ql/src/test/results/clientnegative/addpart1.q.out f4da8f1 
  ql/src/test/results/clientnegative/alter_concatenate_indexed_table.q.out 
8ae1f9d 
  ql/src/test/results/clientnegative/alter_non_native.q.out 8be2c3b 
  ql/src/test/results/clientnegative/alter_view_failure.q.out 9954b66 
  ql/src/test/results/clientnegative/alter_view_failure2.q.out 5915b4f 
  ql/src/test/results/clientnegative/alter_view_failure4.q.out 97d6b18 
  ql/src/test/results/clientnegative/alter_view_failure5.q.out 2291ca6 
  ql/src/test/results/clientnegative/alter_view_failure6.q.out 03b2bc3 
  ql/src/test/results/clientnegative/alter_view_failure7.q.out d0f958c 
  ql/src/test/results/clientnegative/alter_view_failure8.q.out 4420c57 
  ql/src/test/results/clientnegative/alter_view_failure9.q.out 67306d3 
  ql/src/test/results/clientnegative/altern1.q.out c52ca04 
  ql/src/test/results/clientnegative/analyze_view.q.out 99def40 
  ql/src/test/results/clientnegative/archive1.q.out 0927686 
  ql/src/test/results/clientnegative/archive2.q.out 25baefa 
  ql/src/test/results/clientnegative/authorization_fail_1.q.out ab1abe2 
  ql/src/test/results/clientnegative/authorization_fail_3.q.out cd7ceb1 
  ql/src/test/results/clientnegative/authorization_fail_4.q.out b05f9b7 
  ql/src/test/results/clientnegative/authorization_fail_5.q.out f5bdc6a 
  ql/src/test/results/clientnegative/authorization_fail_7.q.out a52fd1c 
  ql/src/test/results/clientnegative/authorization_part.q.out 625d60c 
  ql/src/test/results/clientnegative/column_rename1.q.out 7c30e4e 
  ql/src/test/results/clientnegative/column_rename2.q.out 0ca78f9 
  ql/src/test/results/clientnegative/column_rename4.q.out f14fd48 
  ql/src/test/results/clientnegative/create_or_replace_view1.q.out 97bfa21 
  ql/src/test/results/clientnegative/create_or_replace_view2.q.out 8edac34 
  ql/src/test/results/clientnegative/create_or_replace_view4.q.out 89dd5f5 
  ql/src/test/results/clientnegative/create_or_replace_view5.q.out a0aed59 
  ql/src/test/results/clientnegative/create_or_replace_view6.q.out df44e33 
  ql/src/test/results/clientnegative/create_or_replace_view7.q.out 9356dcc 
  ql/src/test/results/clientnegative/create_or_replace_view8.q.out 4161659 
  ql/src/test/results/clientnegative/create_view_failure1.q.out 43cded4 
  ql/src/test/results/clientnegative/create_view_failure2.q.out a038067 
  ql/src/test/results/clientnegative/create_view_failure4.q.out f968569 
  ql/src/test/results/clientnegative/database_create_already_exists.q.out 
08c04f9 
  ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1e58089 
  ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 80c00cd 
  ql/src/test/results/clientnegative/database_drop_not_empty.q.out baa8f37 
  ql/src/test/results/clientnegative/database_drop_not_empty_restrict.q.out 
b297a99 
  ql/src/test/results/clientnegative/database_switch_does_not_exist.q.out 
8b5674d 
  ql/src/test/results/clientnegative/drop_partition_failure.q.out 8a7c63d 
  ql/src/test/results/clientnegative/drop_table_failure2.q.out 9b63102 
  ql/src/test/results/clientnegative/drop_view_failure1.q.out 61ec927 
  ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 
  ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 814b742 
  ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 0351bc1 
  ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 
d40ff27 
  ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out 
adff0f8 
  ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out 
b84e954 
  ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 
96f8452 
  ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 
25deaa3 
  ql/src/test/results/clientnegative/exim_07_nonpart_noncompat_ifof.q.out 
f9c3d5a 
  ql/src/test/results/clientnegative/exim_08_nonpart_noncompat_serde.q.out 
12c737a 
  ql/src/test/results/clientnegative/exim_09_nonpart_noncompat_serdeparam.q.out 
77afe3a 
  

[jira] [Updated] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread Syed S. Albiz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed S. Albiz updated HIVE-2286:


Attachment: HIVE-2286.6.patch

 ClassCastException when building index with security.authorization turned on
 

 Key: HIVE-2286
 URL: https://issues.apache.org/jira/browse/HIVE-2286
 Project: Hive
  Issue Type: Bug
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch, HIVE-2286.6.patch


 When trying to build an index with authorization checks turned on, hive 
 issues the following ClassCastException:
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
  at
 org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848)
  at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293)
  at
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385)
  at
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392)
  at
 org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:567)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread Syed S. Albiz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed S. Albiz updated HIVE-2286:


Status: Patch Available  (was: Open)

 ClassCastException when building index with security.authorization turned on
 

 Key: HIVE-2286
 URL: https://issues.apache.org/jira/browse/HIVE-2286
 Project: Hive
  Issue Type: Bug
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Attachments: HIVE-2286.1.patch, HIVE-2286.2.patch, HIVE-2286.6.patch


 When trying to build an index with authorization checks turned on, hive 
 issues the following ClassCastException:
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer cannot be cast to
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
  at
 org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:540)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:848)
  at
 org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:224)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
  at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:293)
  at
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:385)
  at
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:392)
  at
 org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:567)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2286) ClassCastException when building index with security.authorization turned on

2011-07-26 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071428#comment-13071428
 ] 

jirapos...@reviews.apache.org commented on HIVE-2286:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1137/
---

(Updated 2011-07-26 23:28:13.279889)


Review request for hive, John Sichi and Ning Zhang.


Changes
---

refactor patch to dump query state into an inner class rather than a Stack.


Summary
---

Save the original HiveOperation/commandType when we generate the index builder 
task and restore it after we're done generating the task so that the 
authorization checks make the right decision when deciding what to do.


This addresses bug HIVE-2286.
https://issues.apache.org/jira/browse/HIVE-2286


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe 
  ql/src/test/queries/clientpositive/index_auth.q PRE-CREATION 
  ql/src/test/results/clientnegative/addpart1.q.out f4da8f1 
  ql/src/test/results/clientnegative/alter_concatenate_indexed_table.q.out 
8ae1f9d 
  ql/src/test/results/clientnegative/alter_non_native.q.out 8be2c3b 
  ql/src/test/results/clientnegative/alter_view_failure.q.out 9954b66 
  ql/src/test/results/clientnegative/alter_view_failure2.q.out 5915b4f 
  ql/src/test/results/clientnegative/alter_view_failure4.q.out 97d6b18 
  ql/src/test/results/clientnegative/alter_view_failure5.q.out 2291ca6 
  ql/src/test/results/clientnegative/alter_view_failure6.q.out 03b2bc3 
  ql/src/test/results/clientnegative/alter_view_failure7.q.out d0f958c 
  ql/src/test/results/clientnegative/alter_view_failure8.q.out 4420c57 
  ql/src/test/results/clientnegative/alter_view_failure9.q.out 67306d3 
  ql/src/test/results/clientnegative/altern1.q.out c52ca04 
  ql/src/test/results/clientnegative/analyze_view.q.out 99def40 
  ql/src/test/results/clientnegative/archive1.q.out 0927686 
  ql/src/test/results/clientnegative/archive2.q.out 25baefa 
  ql/src/test/results/clientnegative/authorization_fail_1.q.out ab1abe2 
  ql/src/test/results/clientnegative/authorization_fail_3.q.out cd7ceb1 
  ql/src/test/results/clientnegative/authorization_fail_4.q.out b05f9b7 
  ql/src/test/results/clientnegative/authorization_fail_5.q.out f5bdc6a 
  ql/src/test/results/clientnegative/authorization_fail_7.q.out a52fd1c 
  ql/src/test/results/clientnegative/authorization_part.q.out 625d60c 
  ql/src/test/results/clientnegative/column_rename1.q.out 7c30e4e 
  ql/src/test/results/clientnegative/column_rename2.q.out 0ca78f9 
  ql/src/test/results/clientnegative/column_rename4.q.out f14fd48 
  ql/src/test/results/clientnegative/create_or_replace_view1.q.out 97bfa21 
  ql/src/test/results/clientnegative/create_or_replace_view2.q.out 8edac34 
  ql/src/test/results/clientnegative/create_or_replace_view4.q.out 89dd5f5 
  ql/src/test/results/clientnegative/create_or_replace_view5.q.out a0aed59 
  ql/src/test/results/clientnegative/create_or_replace_view6.q.out df44e33 
  ql/src/test/results/clientnegative/create_or_replace_view7.q.out 9356dcc 
  ql/src/test/results/clientnegative/create_or_replace_view8.q.out 4161659 
  ql/src/test/results/clientnegative/create_view_failure1.q.out 43cded4 
  ql/src/test/results/clientnegative/create_view_failure2.q.out a038067 
  ql/src/test/results/clientnegative/create_view_failure4.q.out f968569 
  ql/src/test/results/clientnegative/database_create_already_exists.q.out 
08c04f9 
  ql/src/test/results/clientnegative/database_create_invalid_name.q.out 1e58089 
  ql/src/test/results/clientnegative/database_drop_does_not_exist.q.out 80c00cd 
  ql/src/test/results/clientnegative/database_drop_not_empty.q.out baa8f37 
  ql/src/test/results/clientnegative/database_drop_not_empty_restrict.q.out 
b297a99 
  ql/src/test/results/clientnegative/database_switch_does_not_exist.q.out 
8b5674d 
  ql/src/test/results/clientnegative/drop_partition_failure.q.out 8a7c63d 
  ql/src/test/results/clientnegative/drop_table_failure2.q.out 9b63102 
  ql/src/test/results/clientnegative/drop_view_failure1.q.out 61ec927 
  ql/src/test/results/clientnegative/dyn_part3.q.out 5f4df65 
  ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 814b742 
  ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 0351bc1 
  ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out 
d40ff27 
  ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out 
adff0f8 
  ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out 
b84e954 
  ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out 
96f8452 
  ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out 
25deaa3 
  

[jira] [Reopened] (HIVE-401) Reduce the ant test time to under 15 minutes

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-401:
-


Yesterday it took me 4 hours to run the tests on trunk.

 Reduce the ant test time to under 15 minutes
 

 Key: HIVE-401
 URL: https://issues.apache.org/jira/browse/HIVE-401
 Project: Hive
  Issue Type: Wish
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: hive_parallel_test.sh


 ant test is taking too long. This is a big overhead for development since 
 we need to do context switching all the time.
 We should bring the time back to under 15 minutes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-494) Select columns by index instead of name

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-494:
-

Description: 
SELECT mytable[0], mytable[2] FROM some_table_name mytable;

...should return the first and third columns, respectively, from mytable 
regardless of their column names.

The need for names specifically is kind of silly when they just get 
translated into numbers anyway.

  was:
In a very real sense, tables are like arrays or matrices with rows and columns. 
IT would be fantastic if I could refer to columns in my select statement by 
their index, rather than by their name.

SELECT mytable[0], mytable[2] FROM some_table_name mytable;

...which would then get the first and third column from mytable. We already 
have syntax like this for array data types, which I think would translate 
nicely: SELECT mytable[0][3], etc.

Or maybe I just spend too much time coding in R...

   Priority: Minor  (was: Major)
Summary: Select columns by index instead of name  (was: Select columns 
by number instead of name)

 Select columns by index instead of name
 ---

 Key: HIVE-494
 URL: https://issues.apache.org/jira/browse/HIVE-494
 Project: Hive
  Issue Type: Wish
  Components: Clients, Query Processor
Reporter: Adam Kramer
Priority: Minor
  Labels: SQL

 SELECT mytable[0], mytable[2] FROM some_table_name mytable;
 ...should return the first and third columns, respectively, from mytable 
 regardless of their column names.
 The need for names specifically is kind of silly when they just get 
 translated into numbers anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2204) unable to get column names for a specific table that has '_' as part of its table name

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2204:
-

Fix Version/s: 0.8.0

 unable to get column names for a specific table that has '_' as part of its 
 table name
 --

 Key: HIVE-2204
 URL: https://issues.apache.org/jira/browse/HIVE-2204
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Mythili Gopalakrishnan
Assignee: Patrick Hunt
 Fix For: 0.8.0

 Attachments: HIVE-2204.patch


 I have a table age_group and I am trying to get list of columns for this 
 table name. As underscore and '%' have special meaning in table search 
 pattern according to JDBC searchPattern string specification, I escape the 
 '_' in my table name when I call getColumns for this single table. But HIVE 
 does not return any columns. My call to getColumns is as follows
 catalog   null
 schemaPattern %
 tableNamePattern  age\_group
 columnNamePattern  %
 If I don't escape the '_' in my tableNamePattern, I am able to get the list 
 of columns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if possible

2011-07-26 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Summary: Comparison Operators convert number types to common type instead 
of double if possible  (was: Comparison Operators convert number types to 
common type instead of double if necessary)

 Comparison Operators convert number types to common type instead of double if 
 possible
 --

 Key: HIVE-2248
 URL: https://issues.apache.org/jira/browse/HIVE-2248
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2248.1.patch


 Now if the two sides of comparison is of different type, we always convert 
 both to double and compare. It was a slight regression from the change in 
 https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, 
 using GenericUDFBridge, always tried to find common type first.
 The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always 
 convert the column and 0 to double and compare, which is wasteful, though it 
 is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HIVE-2046) In error scenario some opened streams may not closed in Utilities.java

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-2046:
--


 In error scenario some opened streams may not closed in Utilities.java
 --

 Key: HIVE-2046
 URL: https://issues.apache.org/jira/browse/HIVE-2046
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2046.Patch


 1) In error scenario XMLDecoder  XMLEncoder may not be closed in 
 serializeMapRedWork() and deserializeMapRedWork() Utilities.java
 2) BufferedReader is not closed in Utilities.StreamPrinter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2046) In error scenario some opened streams may not closed in Utilities.java

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-2046.
--

Resolution: Duplicate

 In error scenario some opened streams may not closed in Utilities.java
 --

 Key: HIVE-2046
 URL: https://issues.apache.org/jira/browse/HIVE-2046
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2046.Patch


 1) In error scenario XMLDecoder  XMLEncoder may not be closed in 
 serializeMapRedWork() and deserializeMapRedWork() Utilities.java
 2) BufferedReader is not closed in Utilities.StreamPrinter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2051:
-

Fix Version/s: 0.8.0

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch, HIVE-2051.5.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HIVE-2044) In error scenario opened streams may not closed in TypedBytesWritableOutput.java

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-2044:
--


 In error scenario opened streams may not closed in 
 TypedBytesWritableOutput.java
 

 Key: HIVE-2044
 URL: https://issues.apache.org/jira/browse/HIVE-2044
 Project: Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2044.Patch


 1) In error scenario DataOutputStream may not be closed in writeWritable of  
 TypedBytesWritableOutput.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1937) DDLSemanticAnalyzer won't take newly set Hive parameters

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1937:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 DDLSemanticAnalyzer won't take newly set Hive parameters
 

 Key: HIVE-1937
 URL: https://issues.apache.org/jira/browse/HIVE-1937
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.8.0

 Attachments: HIVE-1937.2.patch, HIVE-1937.3.patch, HIVE-1937.patch


 Hive DDLSemanticAnalyzer maintains a static reservedPartitionValue set whose 
 values come from several Hive parameters. However even if these parameters 
 are set to new values, the reservedPartitionValue are not changed. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HIVE-1890) Optimize privilege checking for authorization

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-1890:
--


 Optimize privilege checking for authorization
 -

 Key: HIVE-1890
 URL: https://issues.apache.org/jira/browse/HIVE-1890
 Project: Hive
  Issue Type: Improvement
  Components: Security
Reporter: Namit Jain
Assignee: He Yongqiang

 Follow-up of HIVE-78
 There are many queries which have lots of input partitions for the same input 
 table.
 If the table under consideration has the same privilege for all the 
 partitions, you
 dont need to check the permissions for all the partitions. You can find the 
 common
 tables and skip the partitions altogether

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-1890) Optimize privilege checking for authorization

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-1890.
--

Resolution: Duplicate

 Optimize privilege checking for authorization
 -

 Key: HIVE-1890
 URL: https://issues.apache.org/jira/browse/HIVE-1890
 Project: Hive
  Issue Type: Improvement
  Components: Security
Reporter: Namit Jain
Assignee: He Yongqiang

 Follow-up of HIVE-78
 There are many queries which have lots of input partitions for the same input 
 table.
 If the table under consideration has the same privilege for all the 
 partitions, you
 dont need to check the permissions for all the partitions. You can find the 
 common
 tables and skip the partitions altogether

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1644:
-

Fix Version/s: 0.8.0

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: John Sichi
Assignee: Russell Melick
 Fix For: 0.8.0

 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
 HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, 
 HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, 
 HIVE-1644.17.patch, HIVE-1644.18.patch, HIVE-1644.19.patch, 
 HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, 
 HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch, 
 hive.log


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1595:
-

Fix Version/s: 0.8.0

 job name for alter table T archive partition P is not correct
 -

 Key: HIVE-1595
 URL: https://issues.apache.org/jira/browse/HIVE-1595
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: Hive-1595.1.patch, Hive-1595.2.patch


 For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which 
 makes it difficult to identify

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HIVE-1490) More implicit type conversion: UNION ALL and COALESCE

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-1490:
--


 More implicit type conversion: UNION ALL and COALESCE
 -

 Key: HIVE-1490
 URL: https://issues.apache.org/jira/browse/HIVE-1490
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Server Infrastructure
Reporter: Adam Kramer
Assignee: Syed S. Albiz

 This is a usecase that frequently annoys me:
 SELECT TRANSFORM(stuff)
 USING 'script'
 AS thing1, thing2
 FROM some_table
 UNION ALL
 SELECT a.thing1, a.thing2
 FROM some_other_table a
 ...this fails when a.thing1 and a.thing2 are anything but STRING, because all 
 output of TRANSFORM is STRING.
 In this case, a.thing1 and a.thing2 should be implicitly converted to string.
 COALESCE(a.thing1, a.thing2, a.thing3) should similarly do implicit type 
 conversion among the arguments. If two are INT and one is BIGINT, upgrade the 
 INTs, etc.
 At the very least, it would be nice to have syntax like
 SELECT TRANSFORM(stuff)
 USING 'script'
 AS thing1 INT, thing2 INT
 ...which would effectively cast the output column to the specified type. But 
 really, type conversion should work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2199) incorrect success flag passed to jobClose

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2199:
-

Fix Version/s: 0.8.0

 incorrect success flag passed to jobClose
 -

 Key: HIVE-2199
 URL: https://issues.apache.org/jira/browse/HIVE-2199
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Franklin Hu
Assignee: Franklin Hu
Priority: Minor
 Fix For: 0.8.0

 Attachments: hive-2199.1.patch


 For block level merging of RCFiles, jobClose is passed the incorrect variable 
 as the success flag

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2024) In Driver.execute(), mapred.job.tracker is not restored if one of the task fails.

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2024:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 In Driver.execute(), mapred.job.tracker is not restored if one of the task 
 fails.
 -

 Key: HIVE-2024
 URL: https://issues.apache.org/jira/browse/HIVE-2024
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2024.1.patch


 If automatically one job is determined to run in local mode, and the task 
 fails with error code not 0, mapred.job.tracker will remain to be local and 
 might cause further problems.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2052) PostHook and PreHook API to add flag to indicate it is pre or post hook plus cache for content summary

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2052:
-

Fix Version/s: 0.8.0

 PostHook and PreHook API to add flag to indicate it is pre or post hook plus 
 cache for content summary
 --

 Key: HIVE-2052
 URL: https://issues.apache.org/jira/browse/HIVE-2052
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2051.3.patch, HIVE-2052.1.patch, HIVE-2052.2.patch, 
 HIVE-2052.3.patch


 This will allow hooks to share some information better and reduce their 
 latency

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2082) Reduce memory consumption in preparing MapReduce job

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2082:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 Reduce memory consumption in preparing MapReduce job
 

 Key: HIVE-2082
 URL: https://issues.apache.org/jira/browse/HIVE-2082
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.8.0

 Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch


 Hive client side consume a lot of memory when the number of input partitions 
 is large. One reason is that each partition maintains a list of FieldSchema 
 which are intended to deal with schema evolution. However they are not used 
 currently and Hive uses the table level schema for all partitions. This will 
 be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
 almost half (1.2GB to 700BM for 20k partitions). 
 Another large chunk of memory consumption is in the MapReduce job setup phase 
 when a PartitionDesc is created from each Partition object. A property object 
 is maintained in PartitionDesc which contains a full list of columns and 
 types. Due to the same reason, these should be the same as in the table level 
 schema. Also the deserializer initialization takes large amount of memory, 
 which should be avoided. My initial testing for these optimizations cut the 
 memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-178) SELECT without FROM should assume a one-row table with no columns.

2011-07-26 Thread Adam Kramer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kramer updated HIVE-178:
-

Component/s: Testing Infrastructure
Description: 
SELECT 1+1;

should just return '2', but instead hive fails because no table is listed.

SELECT 1+1 FROM (empty table);

should also just return '2', but instead hive succeeds because there is no 
possible output, so it produces no output.

So, currently we have to run 

SELECT 1+1 FROM (silly one-row dummy table);

...which runs a whole mapreduce step to ignore a column of data that is useless 
anyway. This is much easier due to local mode, but still, it would be nice to 
be able to SELECT without specifying a table and to get one row of output in 
moments instead of waiting for even a local-mode job to launch, complete, and 
return.

This is especially useful for testing UDFs.

Relatedly, an optimization by which Hive can tell that data from a table isn't 
even USED would be useful, because it means that the data needn't be 
queried...the only relevant info from the table would be the number of rows it 
has, which is available for free from the metastore.

  was:
SELECT 1+1;

should just return '2', but instead hive fails because no table is listed.

SELECT 1+1 FROM (empty table);

should also just return '2', but instead hive succeeds because there is no 
possible output, so it produces no output.

So, currently we have to run 

SELECT 1+1 FROM (silly one-row dummy table);

...which runs a whole mapreduce step to ignore a column of data that is useless 
anyway. This is much easier due to local mode, but still, it would be nice to 
be able to SELECT without specifying a table and to get one row of output in 
moments instead of waiting for even a local-mode job to launch, complete, and 
return.

Relatedly, an optimization by which Hive can tell that data from a table isn't 
even USED would be useful, because it means that the data needn't be 
queried...the only relevant info from the table would be the number of rows it 
has, which is available for free from the metastore.


 SELECT without FROM should assume a one-row table with no columns.
 --

 Key: HIVE-178
 URL: https://issues.apache.org/jira/browse/HIVE-178
 Project: Hive
  Issue Type: Wish
  Components: Query Processor, Testing Infrastructure
Reporter: Adam Kramer
Priority: Minor
  Labels: SQL

 SELECT 1+1;
 should just return '2', but instead hive fails because no table is listed.
 SELECT 1+1 FROM (empty table);
 should also just return '2', but instead hive succeeds because there is no 
 possible output, so it produces no output.
 So, currently we have to run 
 SELECT 1+1 FROM (silly one-row dummy table);
 ...which runs a whole mapreduce step to ignore a column of data that is 
 useless anyway. This is much easier due to local mode, but still, it would be 
 nice to be able to SELECT without specifying a table and to get one row of 
 output in moments instead of waiting for even a local-mode job to launch, 
 complete, and return.
 This is especially useful for testing UDFs.
 Relatedly, an optimization by which Hive can tell that data from a table 
 isn't even USED would be useful, because it means that the data needn't be 
 queried...the only relevant info from the table would be the number of rows 
 it has, which is available for free from the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2096) throw a error if the input is larger than a threshold for index input format

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2096:
-

  Component/s: Query Processor
   Diagnosability
Fix Version/s: 0.8.0

 throw a error if the input is larger than a threshold for index input format
 

 Key: HIVE-2096
 URL: https://issues.apache.org/jira/browse/HIVE-2096
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability, Query Processor
Affects Versions: 0.8.0
Reporter: Namit Jain
 Fix For: 0.8.0

 Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, 
 HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt


 This can hang for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2096) throw a error if the input is larger than a threshold for index input format

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-2096:


Assignee: Wojciech Galuba

 throw a error if the input is larger than a threshold for index input format
 

 Key: HIVE-2096
 URL: https://issues.apache.org/jira/browse/HIVE-2096
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability, Query Processor
Affects Versions: 0.8.0
Reporter: Namit Jain
Assignee: Wojciech Galuba
 Fix For: 0.8.0

 Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, 
 HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt


 This can hang for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2106) Increase the number of operator counter

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2106:
-

Fix Version/s: 0.8.0

 Increase the number of operator counter 
 

 Key: HIVE-2106
 URL: https://issues.apache.org/jira/browse/HIVE-2106
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.8.0

 Attachments: HIVE-2106.patch


 Currently Hadoop counters have to be defined as enum (hardcoded) and we 
 support up to 400 counters now. This limit the number of operators to 100 
 (each operator has 4 counters). We need to increase the hadoop counters or 
 change the Hive code to use Hadoop 0.20 API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2186:
-

Fix Version/s: 0.8.0

 Dynamic Partitioning Failing because of characters not supported globStatus
 ---

 Key: HIVE-2186
 URL: https://issues.apache.org/jira/browse/HIVE-2186
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch, 
 hive-2186.4.patch, hive-2186.5.patch


 Some dynamic queries failed on the stage of loading partitions if dynamic 
 partition columns contain special characters. We need to escape all of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2159) TableSample(percent ) uses one intermediate size to be int, which overflows for large sampled size, making the sampling never triggered.

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2159:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 TableSample(percent ) uses one intermediate size to be int, which overflows 
 for large sampled size, making the sampling never triggered.
 

 Key: HIVE-2159
 URL: https://issues.apache.org/jira/browse/HIVE-2159
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2159.1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2121) Input Sampling By Splits

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2121:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch, HIVE-2121.7.patch, 
 HIVE-2121.8.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2157) NPE in MapJoinObjectKey

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2157:
-

  Component/s: Query Processor
Fix Version/s: 0.8.0

 NPE in MapJoinObjectKey
 ---

 Key: HIVE-2157
 URL: https://issues.apache.org/jira/browse/HIVE-2157
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.8.0

 Attachments: HIVE-2157.1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2262) mapjoin followed by union all, groupby does not work

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2262:
-

Fix Version/s: (was: 0.7.1)

 mapjoin followed by union all, groupby does not work
 

 Key: HIVE-2262
 URL: https://issues.apache.org/jira/browse/HIVE-2262
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: yu xiang
Priority: Trivial

 sql:
 CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, 
 double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS 
 TERMINATED BY ',';
 CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED 
 BY ',';
 explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 
 as c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = 
 b.int_data1) union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 
 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable 
 group by int_data2;
 exception:
 FAILED: Hive Internal Error: java.lang.NullPointerException(null)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
 at 
 org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
 at 
 org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
 at 
 org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
 at 
 org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
 at 
 org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
 Analyse the reason:
 1.When use mapjoin,union,groupby together,the 
 UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and 
 set up the UnionParseContext.
 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call 
 GenMRRedSink1()).process() to init the plan.But the utask's plan has been set 
 yet, it just need to set reducer.And also the utask is processing temporary 
 table, there is no topOp map to table.So here we get null exception.
 Solutions:
 1.SQL solution:use a sub query to modify the sql;
 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, 
 set a settaskplan flag true to indicate the plan for this utask has been 
 set.When in GenMRRedSink3 ,if this flag sets true, don't use the 
 GenMRRedSink1()).process() to reinit the plan.
 
 if (uCtx.isMapOnlySubq()!upc.isIssetTaskPlan())
 
 I don't know whether the code solution is suitable.
 Is there any better solution?
 thx

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2306) Hbase's timestamp attribute to be mapped for read or write, and then import data of timestamp to hbase's table from hive

2011-07-26 Thread Jianyi Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianyi Zhang updated HIVE-2306:
---

Description: 
Current column mapping dosn't support hbase's timestamp column to be mapped for 
read or write, and import data of timestamp to hbase's table from hive.

I find HIVE-1228 mentioned this issue,but not to address the :timestamp 
requirement at last. And 
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration said that 
there is currently no way to access the HBase timestamp attribute, and queries 
always access data with the latest timestamp.

Would allow timestamp to be map to hive(just like Get in hbase API) or INSERT 
OVERWRITE TABLE hbase_table_1 with timestamp from hive(like Put in hbase API)?

  was:
Current column mapping dosn't support hbase's timestamp column to be mapped for 
read or write, and import data of timestamp to hbase's table from hive.

I find HIVE-1228 mentioned this issue,but not to address the :timestamp 
requirement at last. And 
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration said that 
there is currently no way to access the HBase timestamp attribute, and queries 
always access data with the latest timestamp.

This would allow timestamp to be map to hive(just like Gut in hbase API) or 
INSERT OVERWRITE TABLE hbase_table_1 with timestamp from hive(like Put in 
hbase API)?


 Hbase's timestamp attribute to be mapped for read or write, and then import 
 data of timestamp to hbase's table from hive
 

 Key: HIVE-2306
 URL: https://issues.apache.org/jira/browse/HIVE-2306
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Jianyi Zhang
   Original Estimate: 96h
  Remaining Estimate: 96h

 Current column mapping dosn't support hbase's timestamp column to be mapped 
 for read or write, and import data of timestamp to hbase's table from hive.
 I find HIVE-1228 mentioned this issue,but not to address the :timestamp 
 requirement at last. And 
 https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration said that 
 there is currently no way to access the HBase timestamp attribute, and 
 queries always access data with the latest timestamp.
 Would allow timestamp to be map to hive(just like Get in hbase API) or 
 INSERT OVERWRITE TABLE hbase_table_1 with timestamp from hive(like Put in 
 hbase API)?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira