[jira] [Updated] (HIVE-2080) Few code improvements in the ql and serde packages.
[ https://issues.apache.org/jira/browse/HIVE-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-2080: --- Status: Patch Available (was: Open) Few code improvements in the ql and serde packages. --- Key: HIVE-2080 URL: https://issues.apache.org/jira/browse/HIVE-2080 Project: Hive Issue Type: Bug Components: Query Processor, Serializers/Deserializers Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2080.Patch Few code improvements in the ql and serde packages. 1) Little performance Improvements 2) Null checks to avoid NPEs 3) Effective varaible management. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2169) Hive should have support for clover and findbugs
Hive should have support for clover and findbugs Key: HIVE-2169 URL: https://issues.apache.org/jira/browse/HIVE-2169 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Iyappan Priority: Minor Fix For: 0.7.1 Hive should have support for clover and findbugs. Clover delivers actionable Java code coverage metrics to assess the impact of unit tests. Findbugs is a bug pattern detector for Java. Both of them can give useful information on the code coverage and potential bugs. Clover and findbugs support should be added as ant targets. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-1996: --- Attachment: HIVE-1996.Patch LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions
[ https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-2111: --- Attachment: HIVE-2111.patch NullPointerException on select * with table using RegexSerDe and partitions --- Key: HIVE-2111 URL: https://issues.apache.org/jira/browse/HIVE-2111 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.7.0 Environment: Amazon Elastic Mapreduce Reporter: Marc Harris Attachments: HIVE-2111.patch When querying against a table that is partitioned, and uses RegexSerde, select with explicit columns works, but select * results in a NullPointerException To reproduce: 1) create a table containing the following text (notice the blank line): start fillerdatafillerdatafiller fillerdata2fillerdata2filler =end= 2) copy the file to hdfs: hadoop dfs -put foo.txt test/part1=x/foo.txt 3) run the following hive commands to create a table: add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar; drop table test; create external table test(col1 STRING, col2 STRING) partitioned by (part1 STRING) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) stored as textfile location 'hdfs:///user/hadoop/test'; alter table test add partition (part1='x'); (Note that the text processor seems to have mangled the regex a bit. Inside each pair of parentheses should be dot star data. After the second pair of parentheses should be dot start dollar). 4) select from it with explicit columns: select part1, col1, col2 from test; outputs: OK x fillerdata fillerdata x NULLNULL x fillerdata 2fillerdata 5) select from it with * columns select * from test; outputs: Failed with exception java.io.IOException:java.lang.NullPointerException 11/04/12 14:28:27 ERROR CliDriver: Failed with exception java.io.IOException:java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at java.util.ArrayList.addAll(ArrayList.java:472) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141) ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1884) Potential risk of resource leaks in Hive
[ https://issues.apache.org/jira/browse/HIVE-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-1884: --- Attachment: HIVE-1884.2.patch Potential risk of resource leaks in Hive Key: HIVE-1884 URL: https://issues.apache.org/jira/browse/HIVE-1884 Project: Hive Issue Type: Bug Components: CLI, Metastore, Query Processor, Server Infrastructure Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0 Environment: Hive 0.6.0, Hadoop 0.20.1 SUSE Linux Enterprise Server 11 (i586) Reporter: Mohit Sikri Assignee: Chinna Rao Lalam Attachments: HIVE-1884.1.PATCH, HIVE-1884.2.patch h3.There are couple of resource leaks. h4.For example, In CliDriver.java, Method :- processReader() the buffered reader is not closed. h3.Also there are risk(s) of resource(s) getting leaked , in such cases we need to re factor the code to move closing of resources in finally block. h4. For Example :- In Throttle.java Method:- checkJobTracker() , the following code snippet might cause resource leak. {code} InputStream in = url.openStream(); in.read(buffer); in.close(); {code} Ideally and as per the best coding practices it should be like below {code} InputStream in=null; try { in = url.openStream(); int numRead = in.read(buffer); } finally { IOUtils.closeStream(in); } {code} Similar cases, were found in ExplainTask.java, DDLTask.java etc.Need to re factor all such occurrences. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Potential risk of resource leaks in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/758/ --- Review request for hive. Summary --- Potential risk of resource leaks in Hive This addresses bug HIVE-1884. https://issues.apache.org/jira/browse/HIVE-1884 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1124130 trunk/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableInput.java 1124130 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1124130 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java 1124130 Diff: https://reviews.apache.org/r/758/diff Testing --- All tests passed Thanks, chinna
[jira] [Commented] (HIVE-1884) Potential risk of resource leaks in Hive
[ https://issues.apache.org/jira/browse/HIVE-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036187#comment-13036187 ] jirapos...@reviews.apache.org commented on HIVE-1884: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/758/ --- Review request for hive. Summary --- Potential risk of resource leaks in Hive This addresses bug HIVE-1884. https://issues.apache.org/jira/browse/HIVE-1884 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1124130 trunk/contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableInput.java 1124130 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1124130 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java 1124130 Diff: https://reviews.apache.org/r/758/diff Testing --- All tests passed Thanks, chinna Potential risk of resource leaks in Hive Key: HIVE-1884 URL: https://issues.apache.org/jira/browse/HIVE-1884 Project: Hive Issue Type: Bug Components: CLI, Metastore, Query Processor, Server Infrastructure Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0 Environment: Hive 0.6.0, Hadoop 0.20.1 SUSE Linux Enterprise Server 11 (i586) Reporter: Mohit Sikri Assignee: Chinna Rao Lalam Attachments: HIVE-1884.1.PATCH, HIVE-1884.2.patch h3.There are couple of resource leaks. h4.For example, In CliDriver.java, Method :- processReader() the buffered reader is not closed. h3.Also there are risk(s) of resource(s) getting leaked , in such cases we need to re factor the code to move closing of resources in finally block. h4. For Example :- In Throttle.java Method:- checkJobTracker() , the following code snippet might cause resource leak. {code} InputStream in = url.openStream(); in.read(buffer); in.close(); {code} Ideally and as per the best coding practices it should be like below {code} InputStream in=null; try { in = url.openStream(); int numRead = in.read(buffer); } finally { IOUtils.closeStream(in); } {code} Similar cases, were found in ExplainTask.java, DDLTask.java etc.Need to re factor all such occurrences. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions
[ https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam reassigned HIVE-2111: -- Assignee: Chinna Rao Lalam NullPointerException on select * with table using RegexSerDe and partitions --- Key: HIVE-2111 URL: https://issues.apache.org/jira/browse/HIVE-2111 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.7.0 Environment: Amazon Elastic Mapreduce Reporter: Marc Harris Assignee: Chinna Rao Lalam Attachments: HIVE-2111.patch When querying against a table that is partitioned, and uses RegexSerde, select with explicit columns works, but select * results in a NullPointerException To reproduce: 1) create a table containing the following text (notice the blank line): start fillerdatafillerdatafiller fillerdata2fillerdata2filler =end= 2) copy the file to hdfs: hadoop dfs -put foo.txt test/part1=x/foo.txt 3) run the following hive commands to create a table: add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar; drop table test; create external table test(col1 STRING, col2 STRING) partitioned by (part1 STRING) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) stored as textfile location 'hdfs:///user/hadoop/test'; alter table test add partition (part1='x'); (Note that the text processor seems to have mangled the regex a bit. Inside each pair of parentheses should be dot star data. After the second pair of parentheses should be dot start dollar). 4) select from it with explicit columns: select part1, col1, col2 from test; outputs: OK x fillerdata fillerdata x NULLNULL x fillerdata 2fillerdata 5) select from it with * columns select * from test; outputs: Failed with exception java.io.IOException:java.lang.NullPointerException 11/04/12 14:28:27 ERROR CliDriver: Failed with exception java.io.IOException:java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at java.util.ArrayList.addAll(ArrayList.java:472) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141) ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions
[ https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036191#comment-13036191 ] Chinna Rao Lalam commented on HIVE-2111: When ever the regular expression is not matching it is returning null but it should return null row. Same thing is happening with the empty row also. NullPointerException on select * with table using RegexSerDe and partitions --- Key: HIVE-2111 URL: https://issues.apache.org/jira/browse/HIVE-2111 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.7.0 Environment: Amazon Elastic Mapreduce Reporter: Marc Harris Assignee: Chinna Rao Lalam Attachments: HIVE-2111.patch When querying against a table that is partitioned, and uses RegexSerde, select with explicit columns works, but select * results in a NullPointerException To reproduce: 1) create a table containing the following text (notice the blank line): start fillerdatafillerdatafiller fillerdata2fillerdata2filler =end= 2) copy the file to hdfs: hadoop dfs -put foo.txt test/part1=x/foo.txt 3) run the following hive commands to create a table: add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar; drop table test; create external table test(col1 STRING, col2 STRING) partitioned by (part1 STRING) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) stored as textfile location 'hdfs:///user/hadoop/test'; alter table test add partition (part1='x'); (Note that the text processor seems to have mangled the regex a bit. Inside each pair of parentheses should be dot star data. After the second pair of parentheses should be dot start dollar). 4) select from it with explicit columns: select part1, col1, col2 from test; outputs: OK x fillerdata fillerdata x NULLNULL x fillerdata 2fillerdata 5) select from it with * columns select * from test; outputs: Failed with exception java.io.IOException:java.lang.NullPointerException 11/04/12 14:28:27 ERROR CliDriver: Failed with exception java.io.IOException:java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at java.util.ArrayList.addAll(ArrayList.java:472) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141) ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2111) NullPointerException on select * with table using RegexSerDe and partitions
[ https://issues.apache.org/jira/browse/HIVE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-2111: --- Status: Patch Available (was: Open) NullPointerException on select * with table using RegexSerDe and partitions --- Key: HIVE-2111 URL: https://issues.apache.org/jira/browse/HIVE-2111 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.7.0 Environment: Amazon Elastic Mapreduce Reporter: Marc Harris Assignee: Chinna Rao Lalam Attachments: HIVE-2111.patch When querying against a table that is partitioned, and uses RegexSerde, select with explicit columns works, but select * results in a NullPointerException To reproduce: 1) create a table containing the following text (notice the blank line): start fillerdatafillerdatafiller fillerdata2fillerdata2filler =end= 2) copy the file to hdfs: hadoop dfs -put foo.txt test/part1=x/foo.txt 3) run the following hive commands to create a table: add jar s3://elasticmapreduce/samples/hive/jars/hive_contrib.jar; drop table test; create external table test(col1 STRING, col2 STRING) partitioned by (part1 STRING) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ( input.regex = ^\(.*data\)\(.*data\).*$) stored as textfile location 'hdfs:///user/hadoop/test'; alter table test add partition (part1='x'); (Note that the text processor seems to have mangled the regex a bit. Inside each pair of parentheses should be dot star data. After the second pair of parentheses should be dot start dollar). 4) select from it with explicit columns: select part1, col1, col2 from test; outputs: OK x fillerdata fillerdata x NULLNULL x fillerdata 2fillerdata 5) select from it with * columns select * from test; outputs: Failed with exception java.io.IOException:java.lang.NullPointerException 11/04/12 14:28:27 ERROR CliDriver: Failed with exception java.io.IOException:java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:149) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at java.util.ArrayList.addAll(ArrayList.java:472) at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:141) ... 10 more -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1884) Potential risk of resource leaks in Hive
[ https://issues.apache.org/jira/browse/HIVE-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-1884: --- Status: Patch Available (was: In Progress) Potential risk of resource leaks in Hive Key: HIVE-1884 URL: https://issues.apache.org/jira/browse/HIVE-1884 Project: Hive Issue Type: Bug Components: CLI, Metastore, Query Processor, Server Infrastructure Affects Versions: 0.6.0, 0.5.0, 0.4.1, 0.4.0, 0.3.0 Environment: Hive 0.6.0, Hadoop 0.20.1 SUSE Linux Enterprise Server 11 (i586) Reporter: Mohit Sikri Assignee: Chinna Rao Lalam Attachments: HIVE-1884.1.PATCH, HIVE-1884.2.patch h3.There are couple of resource leaks. h4.For example, In CliDriver.java, Method :- processReader() the buffered reader is not closed. h3.Also there are risk(s) of resource(s) getting leaked , in such cases we need to re factor the code to move closing of resources in finally block. h4. For Example :- In Throttle.java Method:- checkJobTracker() , the following code snippet might cause resource leak. {code} InputStream in = url.openStream(); in.read(buffer); in.close(); {code} Ideally and as per the best coding practices it should be like below {code} InputStream in=null; try { in = url.openStream(); int numRead = in.read(buffer); } finally { IOUtils.closeStream(in); } {code} Similar cases, were found in ExplainTask.java, DDLTask.java etc.Need to re factor all such occurrences. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036197#comment-13036197 ] Chinna Rao Lalam commented on HIVE-1996: After file name got changed it is trying to load with the old name because of this load is failed. Now we have changed the code like, load with the changed filename for that introduced a map it will maintain the old name and changed filename as key value pair and while loading need to use this map. LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-1996: --- Status: Patch Available (was: Open) LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2147) Add api to send / receive message to metastore
[ https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036282#comment-13036282 ] Ashutosh Chauhan commented on HIVE-2147: Can someone take a look at this one? Add api to send / receive message to metastore -- Key: HIVE-2147 URL: https://issues.apache.org/jira/browse/HIVE-2147 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: api-without-thrift.patch This is follow-up work on HIVE-2038. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2170) Cannot run Hive 0.7.0 with Hadoop 0.20.203
Cannot run Hive 0.7.0 with Hadoop 0.20.203 --- Key: HIVE-2170 URL: https://issues.apache.org/jira/browse/HIVE-2170 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.7.0 Reporter: Yifeng Geng Run hive-0.7.0 when hadoop-0.20.203 is up, get error as follows: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Hive history file=/tmp/yifeng/hive_job_log_yifeng_201105200054_1479252065.txt Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation; at org.apache.hadoop.hive.shims.Hadoop20Shims.getUGIForConf(Hadoop20Shims.java:448) at org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.setConf(HadoopDefaultAuthenticator.java:51) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:222) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:219) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:417) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2171) Allow custom serdes to set field comments
Allow custom serdes to set field comments - Key: HIVE-2171 URL: https://issues.apache.org/jira/browse/HIVE-2171 Project: Hive Issue Type: Improvement Reporter: Jakob Homan Assignee: Jakob Homan Currently, while serde implementations can set a field's name, they can't set its comment. These are set in the metastore utils to {{(from deserializer)}}. For those serdes that can provide meaningful comments for a field, they should be propagated to the table description. These serde-provided comments could be prepended to (from deserializer) if others feel that's a meaningful distinction. This change involves updating {{StructField}} to support a (possibly null) comment field and then propagating this change out to the myriad places {{StructField}} is thrown around. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #738
See https://builds.apache.org/hudson/job/Hive-trunk-h0.21/738/ -- [...truncated 30282 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-05-19_12-36-56_968_6407767470470923231/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-05-19 12:37:00,035 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-05-19_12-36-56_968_6407767470470923231/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191237_54578459.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-05-19_12-37-01_450_792647796604120/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-05-19_12-37-01_450_792647796604120/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191237_1426635029.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE
Updated Hive Roadmap
Hi, I've updated the roadmap wiki page (http://wiki.apache.org/hadoop/Hive/Roadmap) by removing of some of the spam links and adding more projects up for grabs. Most of the added projects are from a list of summer intern projects in Facebook. We also mentioned the list in the last Hive Contributor Meeting on April 25th. We are opening it up here so that the outside contributors/researchers may have a better view of the future work we are doing. Please feel free to propose more interesting projects that benefit the whole Hive community. Thanks, Ning
[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage
[ https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036449#comment-13036449 ] Marquis Wang commented on HIVE-2036: Making notes on how to do this: One of the difficult/different parts about using bitmap indexes is that the only time they become useful is when multiple indexes are combined. Thus, you need a query that joins the various bitmap index tables and returns the blocks that contain the rows we want. Thus the two parts to writing the automatic use index handler for bitmap indexes are: 1. Figuring out what indexes to use: As mentioned above, you may need to extend the IndexPredicateAnalyzer to support ORs and possibly to return a tree of predicates (I don't think it already does this). 2. Building a query that accesses the index tables: This is an example query that I know works for querying the index tables in the query {noformat} SELECT * FROM lineitem WHERE L_QUANTITY = 50.0 AND L_DISCOUNT = 0.08 AND L_TAX = 0.01; {noformat} {noformat} SELECT bucketname AS `_bucketname`, COLLECT_SET(offset) as `_offsets` FROM (SELECT `_bucketname` AS bucketname, `_offset` AS offset FROM (SELECT ab.`_bucketname`, ab.`_offset`, EWAH_BITMAP_AND(ab.bitmap, c.`_bitmaps`) as bitmap FROM (SELECT a.`_bucketname`, b.`_offset`, EWAH_BITMAP_AND(a.`_bitmaps`, b.`_bitmaps`) as bitmap FROM (SELECT * FROM default__lineitem_quantity__ WHERE L_QUANTITY = 50.0) a JOIN (SELECT * FROM default__lineitem_discount__ WHERE L_DISCOUNT = 0.08) b ON a.`_bucketname` = b.`_bucketname` AND a.`_offset` = b.`_offset`) ab JOIN (SELECT * FROM default__lineitem_tax__ WHERE L_TAX = 0.01) c ON ab.`_bucketname` = c.`_bucketname` AND ab.`_offset` = c.`_offset`) abc WHERE NOT EWAH_BITMAP_EMPTY(abc.bitmap) ) t GROUP BY bucketname; {noformat} This format is perfect for joining any number of AND predicates. I'm pretty sure you can figure out how to expand them to include OR predicates and different grounping of predicates as well. If you make any changes/extensions to the format you should be sure to test them to make sure they have the performance characteristics you want. Update bitmap indexes for automatic usage - Key: HIVE-2036 URL: https://issues.apache.org/jira/browse/HIVE-2036 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Jeffrey Lym HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap index support. The bitmap code will need to be extended after it is committed to enable automatic use of indexing. Most work will be focused in the BitmapIndexHandler, which needs to generate the re-entrant QL index query. There may also be significant work in the IndexPredicateAnalyzer to support predicates with OR's, instead of just AND's as it is currently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2172) Hive CLI should let you specify database on the command line
Hive CLI should let you specify database on the command line Key: HIVE-2172 URL: https://issues.apache.org/jira/browse/HIVE-2172 Project: Hive Issue Type: New Feature Components: CLI Reporter: Carl Steinbach Priority: Minor I'd like to be able to do the following: {noformat} % hive --dbname=mydb hive ... {noformat} instead of having to do: {noformat} % hive hive use mydb; hive ... {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2144) reduce workload generated by JDBCStatsPublisher
[ https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomasz Nykiel updated HIVE-2144: Attachment: HIVE-2144.patch reduce workload generated by JDBCStatsPublisher --- Key: HIVE-2144 URL: https://issues.apache.org/jira/browse/HIVE-2144 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Tomasz Nykiel Attachments: HIVE-2144.patch In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID was inserted by another task (mostly likely a speculative or previously failed task). Depending on if the ID is there, an INSERT or UPDATE query was issues. So there are basically 2x of queries per row inserted into the intermediate stats table. This workload could be reduced to 1/2 if we insert it anyway (it is very rare that IDs are duplicated) and use a different SQL query in the aggregation phase to dedup the ID (e.g., using group-by and max()). The benefits are that even though the aggregation query is more expensive, it is only run once per query. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2173) Alter table recover partitions
Alter table recover partitions -- Key: HIVE-2173 URL: https://issues.apache.org/jira/browse/HIVE-2173 Project: Hive Issue Type: New Feature Components: CLI, Metastore Reporter: Ashutosh Chauhan From mailing list thread: http://mail-archives.apache.org/mod_mbox/hive-user/201105.mbox/%3CBANLkTi=R1Dh2sNKyyJm=VsX=yqvx5mb...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher
[ https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036566#comment-13036566 ] jirapos...@reviews.apache.org commented on HIVE-2144: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/765/ --- Review request for hive. Summary --- Currently, the JDBCStatsPublisher executes two queries per inserted row of statistics, first query to check if the ID was inserted by another task, and second query to insert a new or update the existing row. The latter occurs very rarely, since duplicates most likely originate from speculative failed tasks. Currently the schema of the stat table is the following: PARTITION_STAT_TABLE ( ID VARCHAR(255), ROW_COUNT BIGINT ) and does not have any integrity constraints declared. We amend it to: PARTITION_STAT_TABLE ( ID VARCHAR(255) PRIMARY KEY , ROW_COUNT BIGINT ). HIVE-2144 improves on performance by greedily performing the insertion statement. Then instead of executing two queries per row inserted, we can execute one INSERT query. In the case primary key constraint violation, we perform a single UPDATE query. The UPDATE query needs to check the condition, if the currently inserted stats are newer then the ones already in the table. This addresses bug HIVE-2144. https://issues.apache.org/jira/browse/HIVE-2144 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 1125140 trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java PRE-CREATION Diff: https://reviews.apache.org/r/765/diff Testing --- TestStatsPublisher JUnit test: - basic behaviour - multiple updates - cleanup of the statistics table after aggregation Standalone testing on the cluster. - insert/analyze queries over non-partitioned/partitioned tables NOTE. For the correct behaviour, the primary_key index needs to be created, or the PARTITION_STAT_TABLE table dropped - which triggers creation of the table with the constraint declared. Thanks, Tomasz reduce workload generated by JDBCStatsPublisher --- Key: HIVE-2144 URL: https://issues.apache.org/jira/browse/HIVE-2144 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Tomasz Nykiel Attachments: HIVE-2144.patch In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID was inserted by another task (mostly likely a speculative or previously failed task). Depending on if the ID is there, an INSERT or UPDATE query was issues. So there are basically 2x of queries per row inserted into the intermediate stats table. This workload could be reduced to 1/2 if we insert it anyway (it is very rare that IDs are duplicated) and use a different SQL query in the aggregation phase to dedup the ID (e.g., using group-by and max()). The benefits are that even though the aggregation query is more expensive, it is only run once per query. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2144 reduce workload generated by JDBCStatsPublisher
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/765/ --- Review request for hive. Summary --- Currently, the JDBCStatsPublisher executes two queries per inserted row of statistics, first query to check if the ID was inserted by another task, and second query to insert a new or update the existing row. The latter occurs very rarely, since duplicates most likely originate from speculative failed tasks. Currently the schema of the stat table is the following: PARTITION_STAT_TABLE ( ID VARCHAR(255), ROW_COUNT BIGINT ) and does not have any integrity constraints declared. We amend it to: PARTITION_STAT_TABLE ( ID VARCHAR(255) PRIMARY KEY , ROW_COUNT BIGINT ). HIVE-2144 improves on performance by greedily performing the insertion statement. Then instead of executing two queries per row inserted, we can execute one INSERT query. In the case primary key constraint violation, we perform a single UPDATE query. The UPDATE query needs to check the condition, if the currently inserted stats are newer then the ones already in the table. This addresses bug HIVE-2144. https://issues.apache.org/jira/browse/HIVE-2144 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 1125140 trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java PRE-CREATION Diff: https://reviews.apache.org/r/765/diff Testing --- TestStatsPublisher JUnit test: - basic behaviour - multiple updates - cleanup of the statistics table after aggregation Standalone testing on the cluster. - insert/analyze queries over non-partitioned/partitioned tables NOTE. For the correct behaviour, the primary_key index needs to be created, or the PARTITION_STAT_TABLE table dropped - which triggers creation of the table with the constraint declared. Thanks, Tomasz
[jira] [Created] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.
unit tests fail consistently when run according to instructions on hive how to contribute page. - Key: HIVE-2174 URL: https://issues.apache.org/jira/browse/HIVE-2174 Project: Hive Issue Type: Bug Components: Build Infrastructure, Testing Infrastructure Affects Versions: 0.7.1, 0.8.0 Reporter: Patrick Hunt Priority: Critical The unit tests fail consistently when run according to the doc on hive how to contribute page. Specifically if you: 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start with a _very_ clean slate) 2) ant clean test tar -logfile ant.log the tests will fail (you can run just bucketmapjoin1.q instead of all the tests, it exhibits this behavior). However if you instead do the following 2) ant clean package test tar -logfile ant.log the tests pass (notice the addition to package to the targets). I've tried this on 5 different systems (mix of linux 32/64 bit) and the result is consistent. Running ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q I see the following reason for failure {quote} [junit] 743c743 [junit] numRows 0 [junit] --- [junit] numRows 464 [junit] 773c773 [junit]numRows 0 [junit] --- [junit]numRows 464 [junit] 793c793 [junit] numRows 0 [junit] --- [junit] numRows 464 {quote} which leads me to believe it's a metastore issue (statistics?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.
[ https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036579#comment-13036579 ] Ning Zhang commented on HIVE-2174: -- Yes, that's a documentation bug. We should always run 'ant package' before running tests or anything. ant package will download some necessary ivy packages and put necessary jar files under build/ directory. For the particular error, I think it is because the derby.jar is not present in the build/ directory without 'ant package'. A fix to the code would be to make 'ant test' dependent on 'package'. But the downside is that each time your run some test it calls 'package', which is not necessary for the 2nd time. unit tests fail consistently when run according to instructions on hive how to contribute page. - Key: HIVE-2174 URL: https://issues.apache.org/jira/browse/HIVE-2174 Project: Hive Issue Type: Bug Components: Build Infrastructure, Testing Infrastructure Affects Versions: 0.7.1, 0.8.0 Reporter: Patrick Hunt Priority: Critical The unit tests fail consistently when run according to the doc on hive how to contribute page. Specifically if you: 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start with a _very_ clean slate) 2) ant clean test tar -logfile ant.log the tests will fail (you can run just bucketmapjoin1.q instead of all the tests, it exhibits this behavior). However if you instead do the following 2) ant clean package test tar -logfile ant.log the tests pass (notice the addition to package to the targets). I've tried this on 5 different systems (mix of linux 32/64 bit) and the result is consistent. Running ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q I see the following reason for failure {quote} [junit] 743c743 [junit] numRows 0 [junit] --- [junit] numRows 464 [junit] 773c773 [junit]numRows 0 [junit] --- [junit]numRows 464 [junit] 793c793 [junit] numRows 0 [junit] --- [junit] numRows 464 {quote} which leads me to believe it's a metastore issue (statistics?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.
[ https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036592#comment-13036592 ] Patrick Hunt commented on HIVE-2174: Thanks Ning, I updated the following page, perhaps you can review: http://wiki.apache.org/hadoop/Hive/HowToContribute Any suggestions where else to look for instructions that should be fixed up? I grepped the latest codebase but didn't see anything obvious. unit tests fail consistently when run according to instructions on hive how to contribute page. - Key: HIVE-2174 URL: https://issues.apache.org/jira/browse/HIVE-2174 Project: Hive Issue Type: Bug Components: Build Infrastructure, Testing Infrastructure Affects Versions: 0.7.1, 0.8.0 Reporter: Patrick Hunt Priority: Critical The unit tests fail consistently when run according to the doc on hive how to contribute page. Specifically if you: 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start with a _very_ clean slate) 2) ant clean test tar -logfile ant.log the tests will fail (you can run just bucketmapjoin1.q instead of all the tests, it exhibits this behavior). However if you instead do the following 2) ant clean package test tar -logfile ant.log the tests pass (notice the addition to package to the targets). I've tried this on 5 different systems (mix of linux 32/64 bit) and the result is consistent. Running ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q I see the following reason for failure {quote} [junit] 743c743 [junit] numRows 0 [junit] --- [junit] numRows 464 [junit] 773c773 [junit]numRows 0 [junit] --- [junit]numRows 464 [junit] 793c793 [junit] numRows 0 [junit] --- [junit] numRows 464 {quote} which leads me to believe it's a metastore issue (statistics?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.
[ https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned HIVE-2174: -- Assignee: Patrick Hunt unit tests fail consistently when run according to instructions on hive how to contribute page. - Key: HIVE-2174 URL: https://issues.apache.org/jira/browse/HIVE-2174 Project: Hive Issue Type: Bug Components: Build Infrastructure, Testing Infrastructure Affects Versions: 0.7.1, 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical The unit tests fail consistently when run according to the doc on hive how to contribute page. Specifically if you: 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start with a _very_ clean slate) 2) ant clean test tar -logfile ant.log the tests will fail (you can run just bucketmapjoin1.q instead of all the tests, it exhibits this behavior). However if you instead do the following 2) ant clean package test tar -logfile ant.log the tests pass (notice the addition to package to the targets). I've tried this on 5 different systems (mix of linux 32/64 bit) and the result is consistent. Running ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q I see the following reason for failure {quote} [junit] 743c743 [junit] numRows 0 [junit] --- [junit] numRows 464 [junit] 773c773 [junit]numRows 0 [junit] --- [junit]numRows 464 [junit] 793c793 [junit] numRows 0 [junit] --- [junit] numRows 464 {quote} which leads me to believe it's a metastore issue (statistics?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned HIVE-2117: -- Assignee: Patrick Hunt insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2174) unit tests fail consistently when run according to instructions on hive how to contribute page.
[ https://issues.apache.org/jira/browse/HIVE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036597#comment-13036597 ] Carl Steinbach commented on HIVE-2174: -- If the test target has a dependency on package, then this dependency should be made explicit in build.xml. right now test indirectly depends on jar, which is why running 'ant test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q fails. Also, instead of modifying the test target dependencies, I think we should instead try to adhere to ant conventions and instead modify the test.classpath so that it will work after running the jar target. unit tests fail consistently when run according to instructions on hive how to contribute page. - Key: HIVE-2174 URL: https://issues.apache.org/jira/browse/HIVE-2174 Project: Hive Issue Type: Bug Components: Build Infrastructure, Testing Infrastructure Affects Versions: 0.7.1, 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Critical The unit tests fail consistently when run according to the doc on hive how to contribute page. Specifically if you: 1) checkout the code afresh (or 'git clean -xdf' - basically be sure to start with a _very_ clean slate) 2) ant clean test tar -logfile ant.log the tests will fail (you can run just bucketmapjoin1.q instead of all the tests, it exhibits this behavior). However if you instead do the following 2) ant clean package test tar -logfile ant.log the tests pass (notice the addition to package to the targets). I've tried this on 5 different systems (mix of linux 32/64 bit) and the result is consistent. Running ant clean test -Dtestcase=TestCliDriver -Dqfile=bucketmapjoin1.q I see the following reason for failure {quote} [junit] 743c743 [junit] numRows 0 [junit] --- [junit] numRows 464 [junit] 773c773 [junit]numRows 0 [junit] --- [junit]numRows 464 [junit] 793c793 [junit] numRows 0 [junit] --- [junit] numRows 464 {quote} which leads me to believe it's a metastore issue (statistics?) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location
[ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2117: --- Attachment: HIVE-2117_trunk.patch HIVE-2117_br07.patch Updated patch files for branch 0.7 and trunk. This fixes the problem -- I've also added a new test which verifies the location used for the partition. I verified this failed before my patch and passes after applying my patch. insert overwrite ignoring partition location Key: HIVE-2117 URL: https://issues.apache.org/jira/browse/HIVE-2117 Project: Hive Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt The following code works differently in 0.5.0 vs 0.7.0. In 0.5.0 the partition location is respected. However in 0.7.0 while the initial partition is create with the specified location path/parta, the insert overwrite ... results in the partition written to path/dt=a (note that path is the same in both cases). {code} create table foo_stg (bar INT, car INT); load data local inpath 'data.txt' into table foo_stg; create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta'; from foo_stg fs insert overwrite table foo4 partition (dt='a') select *; {code} From what I can tell HIVE-1707 introduced this via a change to org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, MapString, String, boolean, boolean) specifically: {code} + Path partPath = new Path(tbl.getDataLocation().getPath(), + Warehouse.makePartPath(partSpec)); + + Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath + .toUri().getAuthority(), partPath.toUri().getPath()); {code} Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed). This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #739
See https://builds.apache.org/hudson/job/Hive-trunk-h0.21/739/changes Changes: [sdong] Test commit permission -- [...truncated 30340 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-05-19_19-18-34_477_1564598283089016504/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-05-19 19:18:37,574 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-05-19_19-18-34_477_1564598283089016504/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191918_1447962702.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-05-19_19-18-39_103_6613876364711923264/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-05-19_19-18-39_103_6613876364711923264/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/hudson/job/Hive-trunk-h0.21/ws/hive/build/service/tmp/hive_job_log_hudson_201105191918_1606305529.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table
[jira] [Commented] (HIVE-2036) Update bitmap indexes for automatic usage
[ https://issues.apache.org/jira/browse/HIVE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036685#comment-13036685 ] Russell Melick commented on HIVE-2036: -- To expand a bit on Marquis' comments. In CompactIndexHandler.getIndexPredicateAnalyzer(), we instantiate a predicate analyzer. My theory is that you're going to want a whole new PredicateAnalyzer class to deal with bitmaps, and then you'll instantiate it in a very similar way inside BitmapIndexHandler. You can also see here how we only search for columns on which we have indexes. This is going to need to be modified, since it currently only allows columns from a single index. You may also want to rewrite some of the logic in IndexWhereProcessor.process():110. It currently loops through every index available and asks it to do a rewrite. Perhaps it should loop through every index type and try to find the rewrites possible only using indexes of that type. If you look at IndexPredicateAnalyzer:123, you can see where it's making sure that all the parent operators are AND operations. It should be easy to modify this to allow OR operations, but I'm not sure that simply allowing them and using the current system will maintain logical correctness. It's probably better to start off with just AND's. The pushedPredicate is the important thing returned by the predicate analyzer. The pushed predicate is what it was able to recognize/process. That's the tree you'll want to use to generate the bitmap query. The residual predicate is what it couldn't process. There's a separate JIRA open (HIVE-2115) to use the residual to cut down on remaining work. The query generation lives in the IndexHandlers.generateIndexQuery(...). You'll definitely need more logic than the simple call to decomposedPredicate.pushedPredicate.getExprString() that is in the CompactIndexHandler. There are a few spots where hive.index.compact.file is used. These may need generalized. However, Marquis may have already taken care of this with the bitmap stuff. I don't remember what the new name for it was (I think it's hive.index.blockfilter.file), but it's probably easiest to look in one of his unit tests for it. The last thing I can think of is that having multiple index types on a single table, or queries that use multiple tables may become an issue. I created HIVE-2128 to deal with the multiple tables. Good luck! Update bitmap indexes for automatic usage - Key: HIVE-2036 URL: https://issues.apache.org/jira/browse/HIVE-2036 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Jeffrey Lym HIVE-1644 will provide automatic usage of indexes, and HIVE-1803 adds bitmap index support. The bitmap code will need to be extended after it is committed to enable automatic use of indexing. Most work will be focused in the BitmapIndexHandler, which needs to generate the re-entrant QL index query. There may also be significant work in the IndexPredicateAnalyzer to support predicates with OR's, instead of just AND's as it is currently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira