[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.7.patch New patch which I believe takes care of all the issues in the review for patch 6. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2042) In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java
[ https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009567#comment-13009567 ] Amareshwari Sriramadasu commented on HIVE-2042: --- Running tests. Will commit if tests pass. In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java -- Key: HIVE-2042 URL: https://issues.apache.org/jira/browse/HIVE-2042 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch 1) In error scenario PrintStream may not be closed in execute() of ExplainTask.java 2) In error scenario InputStream may not be closed in checkJobTracker() of Throttle.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Work around for using OR in Joins
I want to use OR in the join expression, but it seems only AND is supported as of now. I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND !C2))} , but it would be nice if somebody can point me to the location in code base that would need modification to support the OR in the join expression. Thanks, MIS.
Re: Work around for using OR in Joins
Found it at *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line no. 1122 There is some concern mentioned that supporting OR would lead to data explosion. Is it discussed/documneted in a little more detail somewhere ? If so, some pointers towards the same will be helpful. Thanks, MIS. On Tue, Mar 22, 2011 at 1:19 PM, MIS misapa...@gmail.com wrote: I want to use OR in the join expression, but it seems only AND is supported as of now. I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND !C2))} , but it would be nice if somebody can point me to the location in code base that would need modification to support the OR in the join expression. Thanks, MIS.
[jira] [Commented] (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the l
[ https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009586#comment-13009586 ] Chinna Rao Lalam commented on HIVE-2031: Updated the patch with test cases. Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load statement. Key: HIVE-2031 URL: https://issues.apache.org/jira/browse/HIVE-2031 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2031.2.patch, HIVE-2031.patch Load into the partitioned table having 2 partitions by specifying only one partition in the load statement is failing and logging the following exception message. {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not found '21Oct' at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685) at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} This needs to be corrected in such a way what is the actual root cause for this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2042) In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java
[ https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-2042: -- Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) I just committed this. Thanks Chinna. In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java -- Key: HIVE-2042 URL: https://issues.apache.org/jira/browse/HIVE-2042 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.8.0 Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch 1) In error scenario PrintStream may not be closed in execute() of ExplainTask.java 2) In error scenario InputStream may not be closed in checkJobTracker() of Throttle.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2042) In error scenario some opened streams may not closed
[ https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-2042: -- Summary: In error scenario some opened streams may not closed (was: In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java) In error scenario some opened streams may not closed Key: HIVE-2042 URL: https://issues.apache.org/jira/browse/HIVE-2042 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.8.0 Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch 1) In error scenario PrintStream may not be closed in execute() of ExplainTask.java 2) In error scenario InputStream may not be closed in checkJobTracker() of Throttle.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2063) jdbc return only 1 collumn
[ https://issues.apache.org/jira/browse/HIVE-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009623#comment-13009623 ] Alexey Diomin commented on HIVE-2063: - wait bug very interesting reproducing on hadoop-0.20.2, but on cloudera CDH3B4 bug not reproducing and apply patch break correct parsing input row, as delimiter in input row is have code '1' (default) jdbc return only 1 collumn -- Key: HIVE-2063 URL: https://issues.apache.org/jira/browse/HIVE-2063 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.0 Reporter: Alexey Diomin Assignee: Alexey Diomin Priority: Critical Attachments: HIVE-2063.patch, HIVE-2063.patch we not set separator for data and all data return in first columns and all other fields set NULL addition we get WARNING: Missing fields! Expected 27 fields but only got 1! Ignoring similar problems. it's regresion after HIVE-1378 bug: use delimiter '\t' for fields on server side use default delimiter with code '1' on client side -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1538) FilterOperator is applied twice with ppd on.
[ https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-1538: -- Attachment: patch-1538-2.txt Added configuration hive.ppd.remove.duplicatefilters, with default value of true. Updated ppd tests to run with both configuration off and on. FilterOperator is applied twice with ppd on. Key: HIVE-1538 URL: https://issues.apache.org/jira/browse/HIVE-1538 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: patch-1538-1.txt, patch-1538-2.txt, patch-1538.txt With hive.optimize.ppd set to true, FilterOperator is applied twice. And it seems second operator is always filtering zero rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates
[ https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2049: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Ning Push down partition pruning to JDO filtering for a subset of partition predicates - Key: HIVE-2049 URL: https://issues.apache.org/jira/browse/HIVE-2049 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.4.patch, HIVE-2049.patch Several tasks: - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. - figure out a safe subset of partition predicates that can be pushed down to JDO filtering. By my initial testing for the 2nd part is equality queries with AND/OR can be pushed down and return correct results. However range queries on partition columns gave NPE by the JDO execute() function. This might be a bug in the JDO query string itself, but we need to figure it out and heavily test all cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2003: Status: Patch Available (was: Open) Please review asap as there are lots of changes to q.out files and any delay may cause another conflict/resolution cycle. LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it. -- Key: HIVE-2003 URL: https://issues.apache.org/jira/browse/HIVE-2003 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt The table/partition being loaded is not being added to outputs in the LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009739#comment-13009739 ] Namit Jain commented on HIVE-2003: -- I will take a look right away LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it. -- Key: HIVE-2003 URL: https://issues.apache.org/jira/browse/HIVE-2003 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt The table/partition being loaded is not being added to outputs in the LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009772#comment-13009772 ] Namit Jain commented on HIVE-2003: -- Instead of adding a new configuration parameter which is being checked in EnforceReadOnlyTables, it might be easier to remove EnforceReadOnlyTables from the hive.exec.pre.hooks at creation time. But, this can be done in a follow-up also (if other things look good). Will commit if tests pass, please file a follow-up jira for the cleanup mentioned above. LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it. -- Key: HIVE-2003 URL: https://issues.apache.org/jira/browse/HIVE-2003 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt The table/partition being loaded is not being added to outputs in the LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #50
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/50/ -- [...truncated 27029 lines...] [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table default.src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table default.src_sequencefile [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.seq' INTO TABLE src_sequencefile [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_sequencefile [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq [junit] Loading data to table default.src_thrift [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/complex.seq' INTO TABLE src_thrift [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_thrift [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt [junit] Loading data to table default.src_json [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/json.txt' INTO TABLE src_json [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/test/logs/negative/wrong_distinct1.q.out https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/ql/src/test/results/compiler/errors/wrong_distinct1.q.out [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103221207_951496250.txt [junit] Done query: wrong_distinct1.q [junit] Begin query: wrong_distinct2.q [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/ql/tmp/hive_job_log_hudson_201103221207_1182796463.txt [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11') [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Copying file: https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: query: LOAD DATA
Build failed in Jenkins: Hive-trunk-h0.20 #632
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/632/ -- [...truncated 28061 lines...] [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-22_12-11-03_852_331101495470767803/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-03-22 12:11:06,951 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-22_12-11-03_852_331101495470767803/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103221211_333569579.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-22_12-11-08_486_3558028685216045491/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-22_12-11-08_486_3558028685216045491/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103221211_1627999456.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int)
[jira] [Created] (HIVE-2069) NullPointerException on getSchemas
NullPointerException on getSchemas -- Key: HIVE-2069 URL: https://issues.apache.org/jira/browse/HIVE-2069 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.8.0 Calling getSchemas will cause a nullpointerexception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2069) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009825#comment-13009825 ] Bennie Schut commented on HIVE-2069: java.lang.NullPointerException at java.util.ArrayList.init(ArrayList.java:131) at org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:481) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:480) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:475) at org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas(TestJdbcDriver.java:488) Probably introduced on HIVE-1126. getCatalogs works correctly but this wasn't tested. NullPointerException on getSchemas -- Key: HIVE-2069 URL: https://issues.apache.org/jira/browse/HIVE-2069 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.8.0 Calling getSchemas will cause a nullpointerexception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2069) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-2069: --- Attachment: HIVE-2069.1.patch.txt This patch includes a fix and a test which can be used to reproduce the nullpointer. NullPointerException on getSchemas -- Key: HIVE-2069 URL: https://issues.apache.org/jira/browse/HIVE-2069 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.8.0 Attachments: HIVE-2069.1.patch.txt Calling getSchemas will cause a nullpointerexception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2069) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-2069: --- Release Note: Fix for NullPointerException on the jdbc driver on getSchemas Status: Patch Available (was: Open) NullPointerException on getSchemas -- Key: HIVE-2069 URL: https://issues.apache.org/jira/browse/HIVE-2069 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.8.0 Attachments: HIVE-2069.1.patch.txt Calling getSchemas will cause a nullpointerexception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2069) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009841#comment-13009841 ] Ning Zhang commented on HIVE-2069: -- +1. will commit if tests pass. NullPointerException on getSchemas -- Key: HIVE-2069 URL: https://issues.apache.org/jira/browse/HIVE-2069 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.8.0 Attachments: HIVE-2069.1.patch.txt Calling getSchemas will cause a nullpointerexception -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2003: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Krishna LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it. -- Key: HIVE-2003 URL: https://issues.apache.org/jira/browse/HIVE-2003 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt The table/partition being loaded is not being added to outputs in the LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2050. batch processing partition pruning process
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/522/ --- Review request for hive. Summary --- Introducing a new metastore API to retrieve a list of partitions in batch. Diffs - trunk/metastore/if/hive_metastore.thrift 1084243 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1084243 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1084243 trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1084243 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1084243 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1084243 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1084243 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1084243 trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1084243 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1084243 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1084243 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1084243 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1084243 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1084243 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1084243 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java 1084243 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1084243 Diff: https://reviews.apache.org/r/522/diff Testing --- Thanks, Ning
[jira] [Updated] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2050: - Status: Patch Available (was: Open) batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2050: - Attachment: HIVE-2050.patch Uploading a new patch for review. Still running tests. The review board request: https://reviews.apache.org/r/522/ batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Meanings of privileges
Hi all, I'm trying to understand the meaning of some of the privileges in the system, and I'm a bit stumped on what some of them actually do. Privileges that confuse me: INDEX - my best guess is that this allows me to create/drop indexes on a table? Is it the case that if I have select access on a table, I can use any index that exists on a table? LOCK - Presumably this allows users to lock or unlock a table, so maybe a better question is: are these locks like mutexes, where only I can access the table, or is this literally locking down the table, so it can't be modified in any way? SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have show_database access, can I not use the show database command? Or does this extend to not being able to see the tables within a database? It seems like you can grant some privileges on objects that don't have a lot of meaning, i.e. create access on a table doesn't seem to have a lot of semantic value, unless Hive requires that permission to create indexes on a table, or something along those lines. Similarly, I'm having a hard time rationalizing why I can grant SHOW_DATABASE on a table. Thanks a lot, Jon
Re: Meanings of privileges
INDEX - my best guess is that this allows me to create/drop indexes on a table? Yes. It is there for this purpose. Is it the case that if I have select access on a table, I can use any index that exists on a table? No. index is also a table now, so you need to have access to both of them. LOCK - Presumably this allows users to lock or unlock a table, so maybe a better question is: are these locks like mutexes, where only I can access the table, or is this literally locking down the table, so it can't be modified in any way? Yes. If only you have lock privilege on this table, and concurrency is enabled, no one will be able to run anything against the table. SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have show_database access, can I not use the show database command? if you don't have show_database access, you should not be able to use the show database command. I do not think today this privilege is supported. create access on a table doesn't seem to have a lot of semantic value i think create on a table means create partition Similarly, I'm having a hard time rationalizing why I can grant SHOW_DATABASE on a table. This should be a bug. Basically each privilege has its set of scope, (can apply to db level or table level or column or user level, non-exclusive) Thanks Yongqiang On Tue, Mar 22, 2011 at 6:30 PM, Jonathan Natkins na...@cloudera.com wrote: Hi all, I'm trying to understand the meaning of some of the privileges in the system, and I'm a bit stumped on what some of them actually do. Privileges that confuse me: INDEX - my best guess is that this allows me to create/drop indexes on a table? Is it the case that if I have select access on a table, I can use any index that exists on a table? LOCK - Presumably this allows users to lock or unlock a table, so maybe a better question is: are these locks like mutexes, where only I can access the table, or is this literally locking down the table, so it can't be modified in any way? SHOW_DATABASE - I'm not sure what the scope of this one is: if I don't have show_database access, can I not use the show database command? Or does this extend to not being able to see the tables within a database? It seems like you can grant some privileges on objects that don't have a lot of meaning, i.e. create access on a table doesn't seem to have a lot of semantic value, unless Hive requires that permission to create indexes on a table, or something along those lines. Similarly, I'm having a hard time rationalizing why I can grant SHOW_DATABASE on a table. Thanks a lot, Jon
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009981#comment-13009981 ] Krishna Kumar commented on HIVE-2065: - Hmm. #3 is taking me a bit too far than I originally thought. I assume being able to read an RCFile as SequenceFile is required, while being able to write an RCFile via the SequenceFile interface is desirable. Having made changes so that record length is correctly set, in order to be able to make sure that the rcfile is handled correctly as a sequence file, the following changes are also required, IIUC. - the second field should be the key length (4 + compressed/plain key contents) - the key class (KeyBuffer) must be made responsible for reading/writing the next field - plain key contents length - as well as compression/decompression of the key contents - the value class (ValueBuffer) related changes will be trickier. Since the value is not compressed as a unit, we can not use record-compressed format. We need to mark the records as plain records, and move the codec to a metadata entry. Then the valueBuffer class will work correctly with sequencefile implementation. Thoughts? worth it? RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009982#comment-13009982 ] He Yongqiang commented on HIVE-2065: if being compatible with sequencefile does not break the rcfile's backward compatibility, it should be ok. But even after that, hive still won't be able to process it as a sequence file because of hive's serde layer. RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009993#comment-13009993 ] Ning Zhang commented on HIVE-2050: -- passed all unit tests. batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Work around for using OR in Joins
Joins with OR conditions are not supported by Hive currently. I think even though you rewrite the condition to use NOT and AND only, the results may be wrong. It is quite hard to implement joins of any tables with OR conditions in a MapReduce framework. it is straightforward to implement it in nested-loop join, but due to the nature of distributed processing, nested loop join cannot be implemented in an efficient and scalable way in MapReduce. In nested-loop join, each mapper need to join a split of LHS table with the whole RHS table which could be terabytes. The regular (reduce-side) join in Hive is essentially a sort-merge join operator. With that in mind, it's hard to implement OR conditions in the sort-merge join. One exception is the map-side join, which assumes the RHS table is small and will be read fully into each mapper. Currently map-side join in Hive is a hash-based join operator. You can implement a nested-loop map-side join operator to enable any join conditions including OR. On Mar 22, 2011, at 1:39 AM, MIS wrote: Found it at *org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.java* line no. 1122 There is some concern mentioned that supporting OR would lead to data explosion. Is it discussed/documneted in a little more detail somewhere ? If so, some pointers towards the same will be helpful. Thanks, MIS. On Tue, Mar 22, 2011 at 1:19 PM, MIS misapa...@gmail.com wrote: I want to use OR in the join expression, but it seems only AND is supported as of now. I have a work around though to use DeMorgan's law {C1 OR C2 = !(!C1 AND !C2))} , but it would be nice if somebody can point me to the location in code base that would need modification to support the OR in the join expression. Thanks, MIS.
Bug in using columns with leading underscores in subqueries
Hi, I believe I've found a bug in the semantic analyzer (or maybe something else?). It occurs when using a column with a leading underscore in a subquery. create table temp (`_col` int, key int); select key from temp; select `_col` from temp; select key from (select key from temp) t; The above queries all work fine. select `_col` from (select `_col` from temp) t; This query fails with FAILED: Error in semantic analysis: line 1:7 Invalid Table Alias or Column Reference `_col` The following query works in lieu of the above. select col as `_col` from (select `_col` as col from temp) t; Thanks, Marquis Wang HMC Computer Science '11
Review Request: HIVE-2069: NullPointerException on getSchemas
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/521/ --- Review request for hive. Summary --- HIVE-2069: NullPointerException on getSchemas This addresses bug HIVE-2069. https://issues.apache.org/jira/browse/HIVE-2069 Diffs - trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveMetaDataResultSet.java 1083926 trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1083926 Diff: https://reviews.apache.org/r/521/diff Testing --- Thanks, Bennie
[jira] [Created] (HIVE-2071) enforcereadonlytables hook should not check a configuration variable
enforcereadonlytables hook should not check a configuration variable Key: HIVE-2071 URL: https://issues.apache.org/jira/browse/HIVE-2071 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Krishna Kumar Instead of adding a new configuration parameter which is being checked in EnforceReadOnlyTables, it might be easier to remove EnforceReadOnlyTables from the hive.exec.pre.hooks at creation time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1803: Implement bitmap indexing in Hive (new review starting from patch 6)
On None, John Sichi wrote: ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java, line 45 https://reviews.apache.org/r/481/diff/1/?file=13771#file13771line45 I'm confused about how the backwards compatibility works for the index filename property...who uses this property name? The property name is set on the command line when the index query is run (see the index_compact.q tests). This String is how the class knows where the index filename is stored. On None, John Sichi wrote: ql/build.xml, line 187 https://reviews.apache.org/r/481/diff/1/?file=13758#file13758line187 Why do you need to unpack the .jar? And why to json/classes? I was getting java.lang.NoClassDefFoundError: javaewah/EWAHCompressedBitmap errors at runtime without unpacking it. I guess I forgot to change the destination to something else when I copied that line. Is unpacking the .jar unnecessary? I'm not really familiar with how ivy(?) handles these libraries. - Marquis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/481/#review315 --- On 2011-03-08 16:27:50, John Sichi wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/481/ --- (Updated 2011-03-08 16:27:50) Review request for hive. Summary --- Review board was giving me grief trying to update the old patch, so I'm creating a fresh review request for HIVE-1803.6 This addresses bug HIVE-1803. https://issues.apache.org/jira/browse/HIVE-1803 Diffs - lib/README 1c2f0b1 lib/javaewah-0.2.jar PRE-CREATION ql/build.xml 50c604e ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ba222f3 ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java ff74f08 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeWork.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectInput.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectOutput.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1f01446 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java 6c320c5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexResult.java 0c9ccea ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeTask.java eac168f ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeWork.java 26beb4e ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 391e5de ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 77220a1 ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 30714b8 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapOp.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap1.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap2.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap3.q PRE-CREATION ql/src/test/queries/clientpositive/index_compact.q 6547a52 ql/src/test/queries/clientpositive/index_compact_1.q 6d59353 ql/src/test/queries/clientpositive/index_compact_2.q 358b5e9 ql/src/test/queries/clientpositive/index_compact_3.q ee8abda ql/src/test/queries/clientpositive/udf_bitmap_and.q PRE-CREATION ql/src/test/queries/clientpositive/udf_bitmap_or.q PRE-CREATION ql/src/test/results/clientpositive/index_bitmap.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap1.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap2.q.out PRE-CREATION
Re: Review Request: HIVE-2054: fix for IOException on the jdbc driver on windows.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/513/ --- (Updated 2011-03-21 12:50:40.422997) Review request for hive. Changes --- New patch because of changes from HIVE-2062 Summary --- HIVE-2054: fix for IOException on the jdbc driver on windows. This addresses bug HIVE-2054. https://issues.apache.org/jira/browse/HIVE-2054 Diffs (updated) - trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 1083914 trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 1083914 trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1083914 trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcSessionState.java 1083914 Diff: https://reviews.apache.org/r/513/diff Testing --- Thanks, Bennie
Review Request: Patch for HIVE-2003: Load analysis should add table/partition to the outputs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/518/ --- Review request for hive. Summary --- Patch for HIVE-2003: Load analysis should add table/partition to the outputs Diffs - contrib/src/test/results/clientpositive/serde_regex.q.out c8b2dac contrib/src/test/results/clientpositive/serde_s3.q.out 95cc726 ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1ff9ea3 ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 892e759 ql/src/test/org/apache/hadoop/hive/ql/hooks/EnforceReadOnlyTables.java 86a6d49 ql/src/test/queries/clientnegative/load_exist_part_authfail.q PRE-CREATION ql/src/test/queries/clientnegative/load_nonpart_authfail.q PRE-CREATION ql/src/test/queries/clientnegative/load_part_authfail.q PRE-CREATION ql/src/test/queries/clientnegative/load_part_nospec.q PRE-CREATION ql/src/test/queries/clientpositive/load_exist_part_authsuccess.q PRE-CREATION ql/src/test/queries/clientpositive/load_nonpart_authsuccess.q PRE-CREATION ql/src/test/queries/clientpositive/load_part_authsuccess.q PRE-CREATION ql/src/test/results/clientnegative/exim_00_unsupported_schema.q.out 119510d ql/src/test/results/clientnegative/exim_01_nonpart_over_loaded.q.out 242da6c ql/src/test/results/clientnegative/exim_02_all_part_over_overlap.q.out b8b019b ql/src/test/results/clientnegative/exim_03_nonpart_noncompat_colschema.q.out 420eade ql/src/test/results/clientnegative/exim_04_nonpart_noncompat_colnumber.q.out 8b89284 ql/src/test/results/clientnegative/exim_05_nonpart_noncompat_coltype.q.out a07fb62 ql/src/test/results/clientnegative/exim_06_nonpart_noncompat_storage.q.out c7638d2 ql/src/test/results/clientnegative/exim_07_nonpart_noncompat_ifof.q.out 3062dbe ql/src/test/results/clientnegative/exim_08_nonpart_noncompat_serde.q.out f229498 ql/src/test/results/clientnegative/exim_09_nonpart_noncompat_serdeparam.q.out 92c27ad ql/src/test/results/clientnegative/exim_10_nonpart_noncompat_bucketing.q.out a98f4f9 ql/src/test/results/clientnegative/exim_11_nonpart_noncompat_sorting.q.out 1fe4b50 ql/src/test/results/clientnegative/exim_13_nonnative_import.q.out 4c4297e ql/src/test/results/clientnegative/exim_14_nonpart_part.q.out 04fa808 ql/src/test/results/clientnegative/exim_15_part_nonpart.q.out e1c67bb ql/src/test/results/clientnegative/exim_16_part_noncompat_schema.q.out 2393918 ql/src/test/results/clientnegative/exim_17_part_spec_underspec.q.out 7f29cb6 ql/src/test/results/clientnegative/exim_18_part_spec_missing.q.out 7f29cb6 ql/src/test/results/clientnegative/exim_19_external_over_existing.q.out 0711b89 ql/src/test/results/clientnegative/exim_20_managed_location_over_existing.q.out 3ad0ad5 ql/src/test/results/clientnegative/exim_21_part_managed_external.q.out 42c7600 ql/src/test/results/clientnegative/exim_23_import_exist_authfail.q.out 8372910 ql/src/test/results/clientnegative/exim_24_import_part_authfail.q.out 0d82700 ql/src/test/results/clientnegative/exim_25_import_nonexist_authfail.q.out 3814e14 ql/src/test/results/clientnegative/fetchtask_ioexception.q.out b9dd07c ql/src/test/results/clientnegative/load_exist_part_authfail.q.out PRE-CREATION ql/src/test/results/clientnegative/load_nonpart_authfail.q.out PRE-CREATION ql/src/test/results/clientnegative/load_part_authfail.q.out PRE-CREATION ql/src/test/results/clientnegative/load_part_nospec.q.out PRE-CREATION ql/src/test/results/clientnegative/load_wrong_fileformat.q.out 645e143 ql/src/test/results/clientnegative/load_wrong_fileformat_rc_seq.q.out 4809d31 ql/src/test/results/clientnegative/load_wrong_fileformat_txt_seq.q.out 9b1ea48 ql/src/test/results/clientnegative/protectmode_part2.q.out daaae80 ql/src/test/results/clientpositive/alter3.q.out e6e5b49 ql/src/test/results/clientpositive/alter_merge.q.out 789ca14 ql/src/test/results/clientpositive/alter_merge_stats.q.out 5c9d387 ql/src/test/results/clientpositive/auto_join_filters.q.out 167c4b0 ql/src/test/results/clientpositive/auto_join_nulls.q.out 4ced637 ql/src/test/results/clientpositive/binarysortable_1.q.out a2e540e ql/src/test/results/clientpositive/bucketizedhiveinputformat.q.out cd3489e ql/src/test/results/clientpositive/bucketmapjoin1.q.out da27428 ql/src/test/results/clientpositive/bucketmapjoin2.q.out 4aeb731 ql/src/test/results/clientpositive/bucketmapjoin3.q.out 1109aae ql/src/test/results/clientpositive/bucketmapjoin4.q.out a45b625 ql/src/test/results/clientpositive/bucketmapjoin5.q.out 3858ae0 ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out c5b4a9c ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out b320252 ql/src/test/results/clientpositive/count.q.out 0b4032c
Re: skew join optimization
How about link to http://imageshack.us/ or TinyPic ? Thanks On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu yuzhih...@gmail.com wrote: Can someone re-attach the missing figures for that wiki ? Thanks On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join into map-join (Otherwise you can specify the mapjoin hints in the query itself.). Because your 'S' table is very small , it can be replicated across all the mappers and the reduce phase can be avoided. This can greatly reduce the runtime .. (See the results section in the page for details.). Hope this helps. Thanks On Sun, Mar 20, 2011 at 6:37 PM, Jov zhao6...@gmail.com wrote: 2011/3/20 Igor Tatarinov i...@decide.com: I have the following join that takes 4.5 hours (with 12 nodes) mostly because of a single reduce task that gets the bulk of the work: SELECT ... FROM T LEFT OUTER JOIN S ON T.timestamp = S.timestamp and T.id = S.id This is a 1:0/1 join so the size of the output is exactly the same as the size of T (500M records). S is actually very small (5K). I've tried: - switching the order of the join conditions - using a different hash function setting (jenkins instead of murmur) - using SET set hive.auto.convert.join = true; are you sure your query convert to mapjoin? if not,try use explicit mapjoin hint. - using SET hive.optimize.skewjoin = true; but nothing helped :( Anything else I can try? Thanks! -- Regards, Bharath .V w:http://research.iiit.ac.in/~bharath.v The wiki does not allow images, confluence does but we have not moved their yet.
Re: skew join optimization
Can someone re-attach the missing figures for that wiki ? Thanks On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join into map-join (Otherwise you can specify the mapjoin hints in the query itself.). Because your 'S' table is very small , it can be replicated across all the mappers and the reduce phase can be avoided. This can greatly reduce the runtime .. (See the results section in the page for details.). Hope this helps. Thanks On Sun, Mar 20, 2011 at 6:37 PM, Jov zhao6...@gmail.com wrote: 2011/3/20 Igor Tatarinov i...@decide.com: I have the following join that takes 4.5 hours (with 12 nodes) mostly because of a single reduce task that gets the bulk of the work: SELECT ... FROM T LEFT OUTER JOIN S ON T.timestamp = S.timestamp and T.id = S.id This is a 1:0/1 join so the size of the output is exactly the same as the size of T (500M records). S is actually very small (5K). I've tried: - switching the order of the join conditions - using a different hash function setting (jenkins instead of murmur) - using SET set hive.auto.convert.join = true; are you sure your query convert to mapjoin? if not,try use explicit mapjoin hint. - using SET hive.optimize.skewjoin = true; but nothing helped :( Anything else I can try? Thanks! -- Regards, Bharath .V w:http://research.iiit.ac.in/~bharath.v
[jira] [Commented] (HIVE-2050) batch processing partition pruning process
[ https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010001#comment-13010001 ] Ning Zhang commented on HIVE-2050: -- Note that this patch implements a simple API that passes a list of partition names rather than a range of partition names. My performance testing indicates that bottleneck is not in the JDO query itself. The JDO queries that getting the list of all MPartitions takes about 5 secs for a list of 20k partitions. However converting these 20k MPartitions to Partitions took about 3 mins. Committing the transaction took another 3 mins. Note that converting MPartitions to Partitions and committing transactions are common operations. Even though we use JDO pushdown (HIVE-2048) or use range queries, these costs are still there. We need to optimize these costs away in the next step. batch processing partition pruning process -- Key: HIVE-2050 URL: https://issues.apache.org/jira/browse/HIVE-2050 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2050.patch For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API). A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.20 #634
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/634/changes Changes: [namit] HIVE-2003 LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it (Krishna Kumar via namit) -- [...truncated 34300 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-22_22-31-53_647_1696713168865305108/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-03-22 22:31:56,771 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-22_22-31-53_647_1696713168865305108/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_20110331_1564066846.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-22_22-31-58_373_6901690555475865206/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-22_22-31-58_373_6901690555475865206/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_20110331_922641636.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit]