[jira] [Updated] (HIVE-2080) Few code improvements in the ql and serde packages.

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2080:
---

Status: Patch Available  (was: Open)

 Few code improvements in the ql and serde packages.
 ---

 Key: HIVE-2080
 URL: https://issues.apache.org/jira/browse/HIVE-2080
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2080.Patch


 Few code improvements in the ql and serde packages.
 1) Little performance Improvements 
 2) Null checks to avoid NPEs
 3) Effective varaible management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2169) Hive should have support for clover and findbugs

2011-05-19 Thread Iyappan (JIRA)
Hive should have support for clover and findbugs


 Key: HIVE-2169
 URL: https://issues.apache.org/jira/browse/HIVE-2169
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Iyappan
Priority: Minor
 Fix For: 0.7.1


Hive should have support for clover and findbugs.

Clover delivers actionable Java code coverage metrics to assess the impact of 
unit tests.
Findbugs is a bug pattern detector for Java. 
Both of them can give useful information on the code coverage and potential 
bugs.
Clover and findbugs support should be added as ant targets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-1996:
---

Attachment: HIVE-1996.Patch

 LOAD DATA INPATH fails when the table already contains a file of the same 
 name
 

 Key: HIVE-1996
 URL: https://issues.apache.org/jira/browse/HIVE-1996
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Kirk True
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1996.Patch


 Steps:
 1. From the command line copy the kv2.txt data file into the current user's 
 HDFS directory:
 {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt 
 kv2.txt}}
 2. In Hive, create the table:
 {{create table tst_src1 (key_ int, value_ string);}}
 3. Load the data into the table from HDFS:
 {{load data inpath './kv2.txt' into table tst_src1;}}
 4. Repeat step 1
 5. Repeat step 3
 Expected:
 To have kv2.txt renamed in HDFS and then copied to the destination as per 
 HIVE-307.
 Actual:
 File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} 
 as it continues to use the same array elements (with the un-renamed, old file 
 names). It crashes with this error:
 {noformat}
 java.lang.NullPointerException
 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
 at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
 at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2111:
---

Attachment: HIVE-2111.patch

 NullPointerException on select * with table using RegexSerDe and partitions
 ---

 Key: HIVE-2111
 URL: https://issues.apache.org/jira/browse/HIVE-2111
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Amazon Elastic Mapreduce
Reporter: Marc Harris
 Attachments: HIVE-2111.patch


 When querying against a table that is partitioned, and uses RegexSerde, 
 select with explicit columns works, but select * results in a 
 NullPointerException
 To reproduce:
 1) create a table containing the following text (notice the blank line):
 start
 fillerdatafillerdatafiller
 fillerdata2fillerdata2filler
 =end=
 2) copy the file to hdfs:
 hadoop dfs -put foo.txt test/part1=x/foo.txt
 3) run the following hive commands to create a table:
 add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar;
 drop table test;
 create external table test(col1 STRING, col2 STRING) 
 partitioned by (part1 STRING) 
 row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
 with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) 
 stored as textfile 
 location 'hdfs:///user/hadoop/test';
 alter table test add partition (part1='x');
 (Note that the text processor seems to have mangled the regex a bit. Inside 
 each pair of parentheses should be dot star data. After the second pair of 
 parentheses should be dot start dollar).
 4) select from it with explicit columns:
 select part1, col1, col2 from test;
 outputs:
 OK
 x fillerdata  fillerdata
 x NULLNULL
 x fillerdata  2fillerdata
 5) select from it with * columns
 select * from test;
 outputs:
 Failed with exception java.io.IOException:java.lang.NullPointerException
 11/04/12 14:28:27 ERROR CliDriver: Failed with exception 
 java.io.IOException:java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.NullPointerException
   at java.util.ArrayList.addAll(ArrayList.java:472)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141)
   ... 10 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1884) Potential risk of resource leaks in Hive

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-1884:
---

Attachment: HIVE-1884.2.patch

 Potential risk of resource leaks in Hive
 

 Key: HIVE-1884
 URL: https://issues.apache.org/jira/browse/HIVE-1884
 Project: Hive
  Issue Type: Bug
  Components: CLI, Metastore, Query Processor, Server Infrastructure
Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0
 Environment: Hive 0.6.0, Hadoop 0.20.1
 SUSE Linux Enterprise Server 11 (i586)
Reporter: Mohit Sikri
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1884.1.PATCH, HIVE-1884.2.patch


 h3.There are couple of resource leaks.
 h4.For example,
 In CliDriver.java, Method :- processReader() the buffered reader is not 
 closed.
 h3.Also there are risk(s) of  resource(s) getting leaked , in such cases we 
 need to re factor the code to move closing of resources in finally block.
 h4. For Example :- 
 In Throttle.java   Method:- checkJobTracker() , the following code snippet 
 might cause resource leak.
 {code}
 InputStream in = url.openStream();
 in.read(buffer);
 in.close();
 {code}
 Ideally and as per the best coding practices it should be like below
 {code}
 InputStream in=null;
 try   {
 in = url.openStream();
 int numRead = in.read(buffer);
 }
 finally {
IOUtils.closeStream(in);
 }
 {code}
 Similar cases, were found in ExplainTask.java, DDLTask.java etc.Need to re 
 factor all such occurrences.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Potential risk of resource leaks in Hive

2011-05-19 Thread chinnarao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/758/
---

Review request for hive.


Summary
---

Potential risk of resource leaks in Hive


This addresses bug HIVE-1884.
https://issues.apache.org/jira/browse/HIVE-1884


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1124130 
  
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableInput.java
 1124130 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1124130 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java 1124130 

Diff: https://reviews.apache.org/r/758/diff


Testing
---

All tests passed


Thanks,

chinna



[jira] [Commented] (HIVE-1884) Potential risk of resource leaks in Hive

2011-05-19 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036187#comment-13036187
 ] 

jirapos...@reviews.apache.org commented on HIVE-1884:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/758/
---

Review request for hive.


Summary
---

Potential risk of resource leaks in Hive


This addresses bug HIVE-1884.
https://issues.apache.org/jira/browse/HIVE-1884


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1124130 
  
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableInput.java
 1124130 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1124130 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java 1124130 

Diff: https://reviews.apache.org/r/758/diff


Testing
---

All tests passed


Thanks,

chinna



 Potential risk of resource leaks in Hive
 

 Key: HIVE-1884
 URL: https://issues.apache.org/jira/browse/HIVE-1884
 Project: Hive
  Issue Type: Bug
  Components: CLI, Metastore, Query Processor, Server Infrastructure
Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0
 Environment: Hive 0.6.0, Hadoop 0.20.1
 SUSE Linux Enterprise Server 11 (i586)
Reporter: Mohit Sikri
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1884.1.PATCH, HIVE-1884.2.patch


 h3.There are couple of resource leaks.
 h4.For example,
 In CliDriver.java, Method :- processReader() the buffered reader is not 
 closed.
 h3.Also there are risk(s) of  resource(s) getting leaked , in such cases we 
 need to re factor the code to move closing of resources in finally block.
 h4. For Example :- 
 In Throttle.java   Method:- checkJobTracker() , the following code snippet 
 might cause resource leak.
 {code}
 InputStream in = url.openStream();
 in.read(buffer);
 in.close();
 {code}
 Ideally and as per the best coding practices it should be like below
 {code}
 InputStream in=null;
 try   {
 in = url.openStream();
 int numRead = in.read(buffer);
 }
 finally {
IOUtils.closeStream(in);
 }
 {code}
 Similar cases, were found in ExplainTask.java, DDLTask.java etc.Need to re 
 factor all such occurrences.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam reassigned HIVE-2111:
--

Assignee: Chinna Rao Lalam

 NullPointerException on select * with table using RegexSerDe and partitions
 ---

 Key: HIVE-2111
 URL: https://issues.apache.org/jira/browse/HIVE-2111
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Amazon Elastic Mapreduce
Reporter: Marc Harris
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2111.patch


 When querying against a table that is partitioned, and uses RegexSerde, 
 select with explicit columns works, but select * results in a 
 NullPointerException
 To reproduce:
 1) create a table containing the following text (notice the blank line):
 start
 fillerdatafillerdatafiller
 fillerdata2fillerdata2filler
 =end=
 2) copy the file to hdfs:
 hadoop dfs -put foo.txt test/part1=x/foo.txt
 3) run the following hive commands to create a table:
 add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar;
 drop table test;
 create external table test(col1 STRING, col2 STRING) 
 partitioned by (part1 STRING) 
 row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
 with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) 
 stored as textfile 
 location 'hdfs:///user/hadoop/test';
 alter table test add partition (part1='x');
 (Note that the text processor seems to have mangled the regex a bit. Inside 
 each pair of parentheses should be dot star data. After the second pair of 
 parentheses should be dot start dollar).
 4) select from it with explicit columns:
 select part1, col1, col2 from test;
 outputs:
 OK
 x fillerdata  fillerdata
 x NULLNULL
 x fillerdata  2fillerdata
 5) select from it with * columns
 select * from test;
 outputs:
 Failed with exception java.io.IOException:java.lang.NullPointerException
 11/04/12 14:28:27 ERROR CliDriver: Failed with exception 
 java.io.IOException:java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.NullPointerException
   at java.util.ArrayList.addAll(ArrayList.java:472)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141)
   ... 10 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions

2011-05-19 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036191#comment-13036191
 ] 

Chinna Rao Lalam commented on HIVE-2111:


When ever the regular expression is not matching it is returning null but it 
should return null row. Same thing is happening with the empty row also.

 NullPointerException on select * with table using RegexSerDe and partitions
 ---

 Key: HIVE-2111
 URL: https://issues.apache.org/jira/browse/HIVE-2111
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Amazon Elastic Mapreduce
Reporter: Marc Harris
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2111.patch


 When querying against a table that is partitioned, and uses RegexSerde, 
 select with explicit columns works, but select * results in a 
 NullPointerException
 To reproduce:
 1) create a table containing the following text (notice the blank line):
 start
 fillerdatafillerdatafiller
 fillerdata2fillerdata2filler
 =end=
 2) copy the file to hdfs:
 hadoop dfs -put foo.txt test/part1=x/foo.txt
 3) run the following hive commands to create a table:
 add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar;
 drop table test;
 create external table test(col1 STRING, col2 STRING) 
 partitioned by (part1 STRING) 
 row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
 with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) 
 stored as textfile 
 location 'hdfs:///user/hadoop/test';
 alter table test add partition (part1='x');
 (Note that the text processor seems to have mangled the regex a bit. Inside 
 each pair of parentheses should be dot star data. After the second pair of 
 parentheses should be dot start dollar).
 4) select from it with explicit columns:
 select part1, col1, col2 from test;
 outputs:
 OK
 x fillerdata  fillerdata
 x NULLNULL
 x fillerdata  2fillerdata
 5) select from it with * columns
 select * from test;
 outputs:
 Failed with exception java.io.IOException:java.lang.NullPointerException
 11/04/12 14:28:27 ERROR CliDriver: Failed with exception 
 java.io.IOException:java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.NullPointerException
   at java.util.ArrayList.addAll(ArrayList.java:472)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141)
   ... 10 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2111:
---

Status: Patch Available  (was: Open)

 NullPointerException on select * with table using RegexSerDe and partitions
 ---

 Key: HIVE-2111
 URL: https://issues.apache.org/jira/browse/HIVE-2111
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Amazon Elastic Mapreduce
Reporter: Marc Harris
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2111.patch


 When querying against a table that is partitioned, and uses RegexSerde, 
 select with explicit columns works, but select * results in a 
 NullPointerException
 To reproduce:
 1) create a table containing the following text (notice the blank line):
 start
 fillerdatafillerdatafiller
 fillerdata2fillerdata2filler
 =end=
 2) copy the file to hdfs:
 hadoop dfs -put foo.txt test/part1=x/foo.txt
 3) run the following hive commands to create a table:
 add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar;
 drop table test;
 create external table test(col1 STRING, col2 STRING) 
 partitioned by (part1 STRING) 
 row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
 with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) 
 stored as textfile 
 location 'hdfs:///user/hadoop/test';
 alter table test add partition (part1='x');
 (Note that the text processor seems to have mangled the regex a bit. Inside 
 each pair of parentheses should be dot star data. After the second pair of 
 parentheses should be dot start dollar).
 4) select from it with explicit columns:
 select part1, col1, col2 from test;
 outputs:
 OK
 x fillerdata  fillerdata
 x NULLNULL
 x fillerdata  2fillerdata
 5) select from it with * columns
 select * from test;
 outputs:
 Failed with exception java.io.IOException:java.lang.NullPointerException
 11/04/12 14:28:27 ERROR CliDriver: Failed with exception 
 java.io.IOException:java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.NullPointerException
   at java.util.ArrayList.addAll(ArrayList.java:472)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141)
   ... 10 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1884) Potential risk of resource leaks in Hive

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-1884:
---

Status: Patch Available  (was: In Progress)

 Potential risk of resource leaks in Hive
 

 Key: HIVE-1884
 URL: https://issues.apache.org/jira/browse/HIVE-1884
 Project: Hive
  Issue Type: Bug
  Components: CLI, Metastore, Query Processor, Server Infrastructure
Affects Versions: 0.6.0, 0.5.0, 0.4.1, 0.4.0, 0.3.0
 Environment: Hive 0.6.0, Hadoop 0.20.1
 SUSE Linux Enterprise Server 11 (i586)
Reporter: Mohit Sikri
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1884.1.PATCH, HIVE-1884.2.patch


 h3.There are couple of resource leaks.
 h4.For example,
 In CliDriver.java, Method :- processReader() the buffered reader is not 
 closed.
 h3.Also there are risk(s) of  resource(s) getting leaked , in such cases we 
 need to re factor the code to move closing of resources in finally block.
 h4. For Example :- 
 In Throttle.java   Method:- checkJobTracker() , the following code snippet 
 might cause resource leak.
 {code}
 InputStream in = url.openStream();
 in.read(buffer);
 in.close();
 {code}
 Ideally and as per the best coding practices it should be like below
 {code}
 InputStream in=null;
 try   {
 in = url.openStream();
 int numRead = in.read(buffer);
 }
 finally {
IOUtils.closeStream(in);
 }
 {code}
 Similar cases, were found in ExplainTask.java, DDLTask.java etc.Need to re 
 factor all such occurrences.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name

2011-05-19 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036197#comment-13036197
 ] 

Chinna Rao Lalam commented on HIVE-1996:


After file name got changed it is trying to load with the old name because of 
this load is failed. 
Now we have changed the code like,  load with the changed filename  for that 
introduced a map it will maintain  the old name and changed filename as  key  
value pair  and  while loading need to use this map.

 LOAD DATA INPATH fails when the table already contains a file of the same 
 name
 

 Key: HIVE-1996
 URL: https://issues.apache.org/jira/browse/HIVE-1996
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Kirk True
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1996.Patch


 Steps:
 1. From the command line copy the kv2.txt data file into the current user's 
 HDFS directory:
 {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt 
 kv2.txt}}
 2. In Hive, create the table:
 {{create table tst_src1 (key_ int, value_ string);}}
 3. Load the data into the table from HDFS:
 {{load data inpath './kv2.txt' into table tst_src1;}}
 4. Repeat step 1
 5. Repeat step 3
 Expected:
 To have kv2.txt renamed in HDFS and then copied to the destination as per 
 HIVE-307.
 Actual:
 File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} 
 as it continues to use the same array elements (with the un-renamed, old file 
 names). It crashes with this error:
 {noformat}
 java.lang.NullPointerException
 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
 at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
 at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name

2011-05-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-1996:
---

Status: Patch Available  (was: Open)

 LOAD DATA INPATH fails when the table already contains a file of the same 
 name
 

 Key: HIVE-1996
 URL: https://issues.apache.org/jira/browse/HIVE-1996
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Kirk True
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1996.Patch


 Steps:
 1. From the command line copy the kv2.txt data file into the current user's 
 HDFS directory:
 {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt 
 kv2.txt}}
 2. In Hive, create the table:
 {{create table tst_src1 (key_ int, value_ string);}}
 3. Load the data into the table from HDFS:
 {{load data inpath './kv2.txt' into table tst_src1;}}
 4. Repeat step 1
 5. Repeat step 3
 Expected:
 To have kv2.txt renamed in HDFS and then copied to the destination as per 
 HIVE-307.
 Actual:
 File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} 
 as it continues to use the same array elements (with the un-renamed, old file 
 names). It crashes with this error:
 {noformat}
 java.lang.NullPointerException
 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
 at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
 at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2147) Add api to send / receive message to metastore

2011-05-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036282#comment-13036282
 ] 

Ashutosh Chauhan commented on HIVE-2147:


Can someone take a look at this one? 

 Add api to send / receive message to metastore
 --

 Key: HIVE-2147
 URL: https://issues.apache.org/jira/browse/HIVE-2147
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: api-without-thrift.patch


 This is follow-up work on HIVE-2038.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2170) Cannot run Hive 0.7.0 with Hadoop 0.20.203

2011-05-19 Thread Yifeng Geng (JIRA)
Cannot run Hive 0.7.0 with Hadoop 0.20.203 
---

 Key: HIVE-2170
 URL: https://issues.apache.org/jira/browse/HIVE-2170
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0
Reporter: Yifeng Geng


Run hive-0.7.0 when hadoop-0.20.203 is up, get error as follows:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Hive history file=/tmp/yifeng/hive_job_log_yifeng_201105200054_1479252065.txt
Exception in thread main java.lang.NoSuchMethodError: 
org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;
at 
org.apache.hadoop.hive.shims.Hadoop20Shims.getUGIForConf(Hadoop20Shims.java:448)
at 
org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.setConf(HadoopDefaultAuthenticator.java:51)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:222)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:219)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:417)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2171) Allow custom serdes to set field comments

2011-05-19 Thread Jakob Homan (JIRA)
Allow custom serdes to set field comments
-

 Key: HIVE-2171
 URL: https://issues.apache.org/jira/browse/HIVE-2171
 Project: Hive
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan


Currently, while serde implementations can set a field's name, they can't set 
its comment.  These are set in the metastore utils to {{(from deserializer)}}.  
For those serdes that can provide meaningful comments for a field, they should 
be propagated to the table description.  These serde-provided comments could be 
prepended to (from deserializer) if others feel that's a meaningful 
distinction.  This change involves updating {{StructField}} to support a 
(possibly null) comment field and then propagating this change out to the 
myriad places {{StructField}} is thrown around.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.21 #738

2011-05-19 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hive-trunk-h0.21/738/

--
[...truncated 30282 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_12-36-56_968_6407767470470923231/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-05-19 12:37:00,035 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_12-36-56_968_6407767470470923231/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191237_54578459.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_12-37-01_450_792647796604120/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_12-37-01_450_792647796604120/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191237_1426635029.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
 

Updated Hive Roadmap

2011-05-19 Thread Ning Zhang
Hi,

I've updated the roadmap wiki page (http://wiki.apache.org/hadoop/Hive/Roadmap) 
by removing of some of the spam links and adding more projects up for grabs.

Most of the added projects are from a list of summer intern projects in 
Facebook. We also mentioned the list in the last Hive Contributor Meeting on 
April 25th. We are opening it up here so that the outside 
contributors/researchers may have a better view of the future work we are doing.

Please feel free to propose more interesting projects that benefit the whole 
Hive community.

Thanks,
Ning


[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage

2011-05-19 Thread Marquis Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036449#comment-13036449
 ] 

Marquis Wang commented on HIVE-2036:


Making notes on how to do this:

One of the difficult/different parts about using bitmap indexes is that the 
only time they become useful is when multiple indexes are combined. Thus, you 
need a query that joins the various bitmap index tables and returns the blocks 
that contain the rows we want.

Thus the two parts to writing the automatic use index handler for bitmap 
indexes are:

1. Figuring out what indexes to use:

As mentioned above, you may need to extend the IndexPredicateAnalyzer to 
support ORs and possibly to return a tree of predicates (I don't think it 
already does this).

2. Building a query that accesses the index tables:

This is an example query that I know works for querying the index tables in the 
query

{noformat}
SELECT * FROM lineitem WHERE  L_QUANTITY = 50.0 AND L_DISCOUNT = 0.08 AND L_TAX 
= 0.01;
{noformat}

{noformat}
SELECT bucketname AS `_bucketname`, COLLECT_SET(offset) as `_offsets`
FROM (SELECT
`_bucketname` AS bucketname, `_offset` AS offset
  FROM
(SELECT ab.`_bucketname`, ab.`_offset`, EWAH_BITMAP_AND(ab.bitmap, 
c.`_bitmaps`) as bitmap FROM
  (SELECT a.`_bucketname`, b.`_offset`, EWAH_BITMAP_AND(a.`_bitmaps`, 
b.`_bitmaps`) as bitmap FROM 
(SELECT * FROM default__lineitem_quantity__ WHERE L_QUANTITY = 
50.0) a JOIN 
(SELECT * FROM default__lineitem_discount__ WHERE L_DISCOUNT = 
0.08) b 
ON a.`_bucketname` = b.`_bucketname` AND a.`_offset` = 
b.`_offset`) ab JOIN
  (SELECT * FROM default__lineitem_tax__ WHERE L_TAX = 0.01) c
ON ab.`_bucketname` = c.`_bucketname` AND ab.`_offset` = 
c.`_offset`) abc 
  WHERE 
NOT EWAH_BITMAP_EMPTY(abc.bitmap)
) t
GROUP BY bucketname;
{noformat}

This format is perfect for joining any number of AND predicates. I'm pretty 
sure you can figure out how to expand them to include OR predicates and 
different grounping of predicates as well. If you make any changes/extensions 
to the format you should be sure to test them to make sure they have the 
performance characteristics you want.

 Update bitmap indexes for automatic usage
 -

 Key: HIVE-2036
 URL: https://issues.apache.org/jira/browse/HIVE-2036
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Jeffrey Lym

 HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap 
 index support.  The bitmap code will need to be extended after it is 
 committed to enable automatic use of indexing.  Most work will be focused in 
 the BitmapIndexHandler, which needs to generate the re-entrant QL index 
 query.  There may also be significant work in the IndexPredicateAnalyzer to 
 support predicates with OR's, instead of just AND's as it is currently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2172) Hive CLI should let you specify database on the command line

2011-05-19 Thread Carl Steinbach (JIRA)
Hive CLI should let you specify database on the command line


 Key: HIVE-2172
 URL: https://issues.apache.org/jira/browse/HIVE-2172
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Carl Steinbach
Priority: Minor


I'd like to be able to do the following:

{noformat}
% hive --dbname=mydb
hive ...
{noformat}

instead of having to do:

{noformat}
% hive
hive use mydb;
hive ...
{noformat}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-19 Thread Tomasz Nykiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HIVE-2144:


Attachment: HIVE-2144.patch

 reduce workload generated by JDBCStatsPublisher
 ---

 Key: HIVE-2144
 URL: https://issues.apache.org/jira/browse/HIVE-2144
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Tomasz Nykiel
 Attachments: HIVE-2144.patch


 In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
 was inserted by another task (mostly likely a speculative or previously 
 failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
 issues. So there are basically 2x of queries per row inserted into the 
 intermediate stats table. This workload could be reduced to 1/2 if we insert 
 it anyway (it is very rare that IDs are duplicated) and use a different SQL 
 query in the aggregation phase to dedup the ID (e.g., using group-by and 
 max()). The benefits are that even though the aggregation query is more 
 expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2173) Alter table recover partitions

2011-05-19 Thread Ashutosh Chauhan (JIRA)
Alter table recover partitions
--

 Key: HIVE-2173
 URL: https://issues.apache.org/jira/browse/HIVE-2173
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Metastore
Reporter: Ashutosh Chauhan


From mailing list thread: 
http://mail-archives.apache.org/mod_mbox/hive-user/201105.mbox/%3CBANLkTi=R1Dh2sNKyyJm=VsX=yqvx5mb...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-19 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036566#comment-13036566
 ] 

jirapos...@reviews.apache.org commented on HIVE-2144:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/765/
---

Review request for hive.


Summary
---

Currently, the JDBCStatsPublisher executes two queries per inserted row of 
statistics, first query to check if the ID was inserted by another task, and 
second query to insert a new or update the existing row.
The latter occurs very rarely, since duplicates most likely originate from 
speculative failed tasks.

Currently the schema of the stat table is the following:

PARTITION_STAT_TABLE ( ID VARCHAR(255), ROW_COUNT BIGINT ) and does not have 
any integrity constraints declared.

We amend it to:

PARTITION_STAT_TABLE ( ID VARCHAR(255) PRIMARY KEY , ROW_COUNT BIGINT ).

HIVE-2144 improves on performance by greedily performing the insertion 
statement.
Then instead of executing two queries per row inserted, we can execute one 
INSERT query.
In the case primary key constraint violation, we perform a single UPDATE query.
The UPDATE query needs to check the condition, if the currently inserted stats 
are newer then the ones already in the table.


This addresses bug HIVE-2144.
https://issues.apache.org/jira/browse/HIVE-2144


Diffs
-

  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
1125140 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/765/diff


Testing
---

TestStatsPublisher JUnit test:
- basic behaviour
- multiple updates
- cleanup of the statistics table after aggregation

Standalone testing on the cluster.
- insert/analyze queries over non-partitioned/partitioned tables

NOTE. For the correct behaviour, the primary_key index needs to be created, or 
the PARTITION_STAT_TABLE table dropped - which triggers creation of the table 
with the constraint declared.


Thanks,

Tomasz



 reduce workload generated by JDBCStatsPublisher
 ---

 Key: HIVE-2144
 URL: https://issues.apache.org/jira/browse/HIVE-2144
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Tomasz Nykiel
 Attachments: HIVE-2144.patch


 In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
 was inserted by another task (mostly likely a speculative or previously 
 failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
 issues. So there are basically 2x of queries per row inserted into the 
 intermediate stats table. This workload could be reduced to 1/2 if we insert 
 it anyway (it is very rare that IDs are duplicated) and use a different SQL 
 query in the aggregation phase to dedup the ID (e.g., using group-by and 
 max()). The benefits are that even though the aggregation query is more 
 expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2144 reduce workload generated by JDBCStatsPublisher

2011-05-19 Thread Tomasz Nykiel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/765/
---

Review request for hive.


Summary
---

Currently, the JDBCStatsPublisher executes two queries per inserted row of 
statistics, first query to check if the ID was inserted by another task, and 
second query to insert a new or update the existing row.
The latter occurs very rarely, since duplicates most likely originate from 
speculative failed tasks.

Currently the schema of the stat table is the following:

PARTITION_STAT_TABLE ( ID VARCHAR(255), ROW_COUNT BIGINT ) and does not have 
any integrity constraints declared.

We amend it to:

PARTITION_STAT_TABLE ( ID VARCHAR(255) PRIMARY KEY , ROW_COUNT BIGINT ).

HIVE-2144 improves on performance by greedily performing the insertion 
statement.
Then instead of executing two queries per row inserted, we can execute one 
INSERT query.
In the case primary key constraint violation, we perform a single UPDATE query.
The UPDATE query needs to check the condition, if the currently inserted stats 
are newer then the ones already in the table.


This addresses bug HIVE-2144.
https://issues.apache.org/jira/browse/HIVE-2144


Diffs
-

  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
1125140 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/765/diff


Testing
---

TestStatsPublisher JUnit test:
- basic behaviour
- multiple updates
- cleanup of the statistics table after aggregation

Standalone testing on the cluster.
- insert/analyze queries over non-partitioned/partitioned tables

NOTE. For the correct behaviour, the primary_key index needs to be created, or 
the PARTITION_STAT_TABLE table dropped - which triggers creation of the table 
with the constraint declared.


Thanks,

Tomasz



[jira] [Created] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.

2011-05-19 Thread Patrick Hunt (JIRA)
unit tests fail consistently when run according to instructions on hive how to 
contribute page.
-

 Key: HIVE-2174
 URL: https://issues.apache.org/jira/browse/HIVE-2174
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure, Testing Infrastructure
Affects Versions: 0.7.1, 0.8.0
Reporter: Patrick Hunt
Priority: Critical


The unit tests fail consistently when run according to the doc on hive how to 
contribute page. Specifically if you:

1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start 
with a _very_ clean slate)
2) ant clean test tar -logfile ant.log

the tests will fail (you can run just bucketmapjoin1.q instead of all the 
tests, it exhibits this behavior). However if you instead do the following 

2) ant clean package test tar -logfile ant.log

the tests pass (notice the addition to package to the targets).

I've tried this on 5 different systems (mix of linux 32/64 bit) and the result 
is consistent.


Running

ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q

I see the following reason for failure

{quote}
[junit] 743c743
[junit]  numRows 0
[junit] ---
[junit]  numRows 464
[junit] 773c773
[junit]numRows 0
[junit] ---
[junit]numRows 464
[junit] 793c793
[junit]  numRows 0
[junit] ---
[junit]  numRows 464
{quote}

which leads me to believe it's a metastore issue (statistics?)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.

2011-05-19 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036579#comment-13036579
 ] 

Ning Zhang commented on HIVE-2174:
--

Yes, that's a documentation bug. We should always run 'ant package' before 
running tests or anything. ant package will download some necessary ivy 
packages and put necessary jar files under build/ directory. For the particular 
error, I think it is because the derby.jar is not present in the build/ 
directory without 'ant package'. 

A fix to the code would be to make 'ant test' dependent on 'package'. But the 
downside is that each time your run some test it calls 'package', which is not 
necessary for the 2nd time. 

 unit tests fail consistently when run according to instructions on hive how 
 to contribute page.
 -

 Key: HIVE-2174
 URL: https://issues.apache.org/jira/browse/HIVE-2174
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure, Testing Infrastructure
Affects Versions: 0.7.1, 0.8.0
Reporter: Patrick Hunt
Priority: Critical

 The unit tests fail consistently when run according to the doc on hive how to 
 contribute page. Specifically if you:
 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start 
 with a _very_ clean slate)
 2) ant clean test tar -logfile ant.log
 the tests will fail (you can run just bucketmapjoin1.q instead of all the 
 tests, it exhibits this behavior). However if you instead do the following 
 2) ant clean package test tar -logfile ant.log
 the tests pass (notice the addition to package to the targets).
 I've tried this on 5 different systems (mix of linux 32/64 bit) and the 
 result is consistent.
 Running
 ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q
 I see the following reason for failure
 {quote}
 [junit] 743c743
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 [junit] 773c773
 [junit]numRows 0
 [junit] ---
 [junit]numRows 464
 [junit] 793c793
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 {quote}
 which leads me to believe it's a metastore issue (statistics?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.

2011-05-19 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036592#comment-13036592
 ] 

Patrick Hunt commented on HIVE-2174:


Thanks Ning, I updated the following page, perhaps you can review: 
http://wiki.apache.org/hadoop/Hive/HowToContribute

Any suggestions where else to look for instructions that should be fixed up? I 
grepped the latest codebase but didn't see anything obvious.

 unit tests fail consistently when run according to instructions on hive how 
 to contribute page.
 -

 Key: HIVE-2174
 URL: https://issues.apache.org/jira/browse/HIVE-2174
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure, Testing Infrastructure
Affects Versions: 0.7.1, 0.8.0
Reporter: Patrick Hunt
Priority: Critical

 The unit tests fail consistently when run according to the doc on hive how to 
 contribute page. Specifically if you:
 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start 
 with a _very_ clean slate)
 2) ant clean test tar -logfile ant.log
 the tests will fail (you can run just bucketmapjoin1.q instead of all the 
 tests, it exhibits this behavior). However if you instead do the following 
 2) ant clean package test tar -logfile ant.log
 the tests pass (notice the addition to package to the targets).
 I've tried this on 5 different systems (mix of linux 32/64 bit) and the 
 result is consistent.
 Running
 ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q
 I see the following reason for failure
 {quote}
 [junit] 743c743
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 [junit] 773c773
 [junit]numRows 0
 [junit] ---
 [junit]numRows 464
 [junit] 793c793
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 {quote}
 which leads me to believe it's a metastore issue (statistics?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.

2011-05-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned HIVE-2174:
--

Assignee: Patrick Hunt

 unit tests fail consistently when run according to instructions on hive how 
 to contribute page.
 -

 Key: HIVE-2174
 URL: https://issues.apache.org/jira/browse/HIVE-2174
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure, Testing Infrastructure
Affects Versions: 0.7.1, 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical

 The unit tests fail consistently when run according to the doc on hive how to 
 contribute page. Specifically if you:
 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start 
 with a _very_ clean slate)
 2) ant clean test tar -logfile ant.log
 the tests will fail (you can run just bucketmapjoin1.q instead of all the 
 tests, it exhibits this behavior). However if you instead do the following 
 2) ant clean package test tar -logfile ant.log
 the tests pass (notice the addition to package to the targets).
 I've tried this on 5 different systems (mix of linux 32/64 bit) and the 
 result is consistent.
 Running
 ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q
 I see the following reason for failure
 {quote}
 [junit] 743c743
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 [junit] 773c773
 [junit]numRows 0
 [junit] ---
 [junit]numRows 464
 [junit] 793c793
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 {quote}
 which leads me to believe it's a metastore issue (statistics?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2117) insert overwrite ignoring partition location

2011-05-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned HIVE-2117:
--

Assignee: Patrick Hunt

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.

2011-05-19 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036597#comment-13036597
 ] 

Carl Steinbach commented on HIVE-2174:
--

If the test target has a dependency on package, then this dependency should 
be made explicit in build.xml. right now test indirectly depends on jar, 
which is why running 'ant test -Dtestcase=TestCliDriver 
-Dqfile=bucketmapjoin1.q fails.

Also, instead of modifying the test target dependencies, I think we should 
instead try to adhere to ant conventions and instead modify the test.classpath 
so that it will work after running the jar target.
 

 unit tests fail consistently when run according to instructions on hive how 
 to contribute page.
 -

 Key: HIVE-2174
 URL: https://issues.apache.org/jira/browse/HIVE-2174
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure, Testing Infrastructure
Affects Versions: 0.7.1, 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Critical

 The unit tests fail consistently when run according to the doc on hive how to 
 contribute page. Specifically if you:
 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start 
 with a _very_ clean slate)
 2) ant clean test tar -logfile ant.log
 the tests will fail (you can run just bucketmapjoin1.q instead of all the 
 tests, it exhibits this behavior). However if you instead do the following 
 2) ant clean package test tar -logfile ant.log
 the tests pass (notice the addition to package to the targets).
 I've tried this on 5 different systems (mix of linux 32/64 bit) and the 
 result is consistent.
 Running
 ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q
 I see the following reason for failure
 {quote}
 [junit] 743c743
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 [junit] 773c773
 [junit]numRows 0
 [junit] ---
 [junit]numRows 464
 [junit] 793c793
 [junit]  numRows 0
 [junit] ---
 [junit]  numRows 464
 {quote}
 which leads me to believe it's a metastore issue (statistics?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

2011-05-19 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2117:
---

Attachment: HIVE-2117_trunk.patch
HIVE-2117_br07.patch

Updated patch files for branch 0.7 and trunk.

This fixes the problem -- I've also added a new test which verifies the 
location used for the partition. I verified this failed before my patch and 
passes after applying my patch.

 insert overwrite ignoring partition location
 

 Key: HIVE-2117
 URL: https://issues.apache.org/jira/browse/HIVE-2117
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, 
 HIVE-2117_trunk.patch, data.txt


 The following code works differently in 0.5.0 vs 0.7.0.
 In 0.5.0 the partition location is respected. 
 However in 0.7.0 while the initial partition is create with the specified 
 location path/parta, the insert overwrite ... results in the partition 
 written to path/dt=a (note that path is the same in both cases).
 {code}
 create table foo_stg (bar INT, car INT); 
 load data local inpath 'data.txt' into table foo_stg;
  
 create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION 
 '/user/hive/warehouse/foo4'; 
 alter table foo4 add partition (dt='a') location 
 '/user/hive/warehouse/foo4/parta';
  
 from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
 {code}
 From what I can tell HIVE-1707 introduced this via a change to
 org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, 
 MapString, String, boolean, boolean)
 specifically:
 {code}
 +  Path partPath = new Path(tbl.getDataLocation().getPath(),
 +  Warehouse.makePartPath(partSpec));
 +
 +  Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
 +  .toUri().getAuthority(), partPath.toUri().getPath());
 {code}
 Reading the description on HIVE-1707 it seems that this may have been done 
 purposefully, however given the partition location is explicitly specified 
 for the partition in question it seems like that should be honored (esp give 
 the table location has not changed).
 This difference in behavior is causing a regression in existing production 
 Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.21 #739

2011-05-19 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hive-trunk-h0.21/739/changes

Changes:

[sdong] Test commit permission

--
[...truncated 30340 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_19-18-34_477_1564598283089016504/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-05-19 19:18:37,574 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_19-18-34_477_1564598283089016504/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191918_1447962702.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_19-18-39_103_6613876364711923264/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-05-19_19-18-39_103_6613876364711923264/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191918_1606305529.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table 

[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage

2011-05-19 Thread Russell Melick (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036685#comment-13036685
 ] 

Russell Melick commented on HIVE-2036:
--

To expand a bit on Marquis' comments.

In CompactIndexHandler.getIndexPredicateAnalyzer(), we instantiate a predicate 
analyzer.  My theory is that you're going to want a whole new PredicateAnalyzer 
class to deal with bitmaps, and then you'll instantiate it in a very similar 
way inside BitmapIndexHandler.  You can also see here how we only search for 
columns on which we have indexes.  This is going to need to be modified, since 
it currently only allows columns from a single index.

You may also want to rewrite some of the logic in 
IndexWhereProcessor.process():110.  It currently loops through every index 
available and asks it to do a rewrite.  Perhaps it should loop through every 
index type and try to find the rewrites possible only using indexes of that 
type.

If you look at IndexPredicateAnalyzer:123, you can see where it's making sure 
that all the parent operators are AND operations.  It should be easy to modify 
this to allow OR operations, but I'm not sure that simply allowing them and 
using the current system will maintain logical correctness.  It's probably 
better to start off with just AND's.

The pushedPredicate is the important thing returned by the predicate analyzer.  
The pushed predicate is what it was able to recognize/process.  That's the tree 
you'll want to use to generate the bitmap query.  The residual predicate is 
what it couldn't process. There's a separate JIRA open (HIVE-2115) to use the 
residual to cut down on remaining work.

The query generation lives in the IndexHandlers.generateIndexQuery(...).  
You'll definitely need more logic than the simple call to 
decomposedPredicate.pushedPredicate.getExprString() that is in the 
CompactIndexHandler.

There are a few spots where hive.index.compact.file is used.  These may need 
generalized.  However, Marquis may have already taken care of this with the 
bitmap stuff.  I don't remember what the new name for it was (I think it's 
hive.index.blockfilter.file), but it's probably easiest to look in one of his 
unit tests for it.

The last thing I can think of is that having multiple index types on a single 
table, or queries that use multiple tables may become an issue.  I created 
HIVE-2128 to deal with the multiple tables.

Good luck!

 Update bitmap indexes for automatic usage
 -

 Key: HIVE-2036
 URL: https://issues.apache.org/jira/browse/HIVE-2036
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: Russell Melick
Assignee: Jeffrey Lym

 HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap 
 index support.  The bitmap code will need to be extended after it is 
 committed to enable automatic use of indexing.  Most work will be focused in 
 the BitmapIndexHandler, which needs to generate the re-entrant QL index 
 query.  There may also be significant work in the IndexPredicateAnalyzer to 
 support predicates with OR's, instead of just AND's as it is currently.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira