[jira] Created: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load
Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load statement. Key: HIVE-2031 URL: https://issues.apache.org/jira/browse/HIVE-2031 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Load into the partitioned table having 2 partitions by specifying only one partition in the load statement is failing and logging the following exception message. {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not found '21Oct' at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685) at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} This needs to be corrected in such a way what is the actual root cause for this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1976) Exception should be thrown when invalid jar,file,archive is given to add command
[ https://issues.apache.org/jira/browse/HIVE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-1976: --- Attachment: HIVE-1976.3.patch Exception should be thrown when invalid jar,file,archive is given to add command Key: HIVE-1976 URL: https://issues.apache.org/jira/browse/HIVE-1976 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.7.0 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-1976.2.patch, HIVE-1976.3.patch, HIVE-1976.patch When executed add command with non existing jar it should throw exception through HiveStatement Ex: {noformat} add jar /root/invalidpath/testjar.jar {noformat} Here testjar.jar is not exist so it should throw exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load
[ https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-2031: --- Attachment: HIVE-2031.patch Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load statement. Key: HIVE-2031 URL: https://issues.apache.org/jira/browse/HIVE-2031 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2031.patch Load into the partitioned table having 2 partitions by specifying only one partition in the load statement is failing and logging the following exception message. {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not found '21Oct' at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685) at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} This needs to be corrected in such a way what is the actual root cause for this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1976) Exception should be thrown when invalid jar,file,archive is given to add command
[ https://issues.apache.org/jira/browse/HIVE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003918#comment-13003918 ] Chinna Rao Lalam commented on HIVE-1976: Ya sorry i missed the behavior change of the CLI scenario. Now i attached one patch with the new solution like instead of throwing the RuntimeExcetion return the CommandProcessorResponse with non zero responseCode. Based on the responseCode of CommandProcessorResponse HiveServer.java(HIVEServer Mode), CliDriver.java(CLI Mode) and HWISessionItem.java(HIVEWebUI mode) will respond. This way the behavior change won't be there. Pls review this and give u r comments. Exception should be thrown when invalid jar,file,archive is given to add command Key: HIVE-1976 URL: https://issues.apache.org/jira/browse/HIVE-1976 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.7.0 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-1976.2.patch, HIVE-1976.3.patch, HIVE-1976.patch When executed add command with non existing jar it should throw exception through HiveStatement Ex: {noformat} add jar /root/invalidpath/testjar.jar {noformat} Here testjar.jar is not exist so it should throw exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the lo
[ https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003951#comment-13003951 ] Chinna Rao Lalam commented on HIVE-2031: {noformat} create table sampletable (a string,b string) PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '@'; LOAD DATA INPATH '/user/root/mytable/joindata.txt' OVERWRITE INTO TABLE sampletable partition (dt='21Oct'); {noformat} The above query will fail because load query don't have 2 partitions information. If the log message is coming like this it is easy to debug {noformat} 2011-03-08 17:13:20,901 ERROR ql.Driver (SessionState.java:printError(365)) - FAILED: Error in semantic analysis: line 1:91 Partition not found '21Oct' org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not found '21Oct' at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685) . at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: table is partitioned but partition spec is not specified or does not fully match table partitioning: {dt=21Oct} at org.apache.hadoop.hive.ql.metadata.Table.isValidSpec(Table.java:341) ... 11 more {noformat} Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load statement. Key: HIVE-2031 URL: https://issues.apache.org/jira/browse/HIVE-2031 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2031.patch Load into the partitioned table having 2 partitions by specifying only one partition in the load statement is failing and logging the following exception message. {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not found '21Oct' at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685) at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} This needs to be corrected in such a way what is the actual root cause for this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load
[ https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-2031: --- Status: Patch Available (was: Open) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load statement. Key: HIVE-2031 URL: https://issues.apache.org/jira/browse/HIVE-2031 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2031.patch Load into the partitioned table having 2 partitions by specifying only one partition in the load statement is failing and logging the following exception message. {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not found '21Oct' at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685) at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {noformat} This needs to be corrected in such a way what is the actual root cause for this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004129#comment-13004129 ] Andrew Wilson commented on HIVE-1555: - Hi, Can I get this issue assigned to me? I have a basic implementation working, which I'd like to contribute. It wraps the DBInputFormat and DBOutputFormat classes. It expects values for the DBConfiguration properties to be provided through the SERDEPROPERTIES block in the create table statement. The configureTableJobProperties() method copies these properties out of the table description and into each job context. It also allows users to set SerDe properties which will cause the DBOutputFormat to generate UPSERT sql statements or DELETE sql statements instead of the vanilla INSERT sql generated by default. Right now this feature has a MySql bias. I am still trying to decide what the best way is to make this more database vendor agnostic. Andrew JDBC Storage Handler Key: HIVE-1555 URL: https://issues.apache.org/jira/browse/HIVE-1555 Project: Hive Issue Type: New Feature Components: JDBC Reporter: Bob Robertson Original Estimate: 24h Remaining Estimate: 24h With the Cassandra and HBase Storage Handlers I thought it would make sense to include a generic JDBC RDBMS Storage Handler so that you could import a standard DB table into Hive. Many people must want to perform HiveQL joins, etc against tables in other systems etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1555: Assignee: Andrew Wilson JDBC Storage Handler Key: HIVE-1555 URL: https://issues.apache.org/jira/browse/HIVE-1555 Project: Hive Issue Type: New Feature Components: JDBC Reporter: Bob Robertson Assignee: Andrew Wilson Original Estimate: 24h Remaining Estimate: 24h With the Cassandra and HBase Storage Handlers I thought it would make sense to include a generic JDBC RDBMS Storage Handler so that you could import a standard DB table into Hive. Many people must want to perform HiveQL joins, etc against tables in other systems etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2032) create database does not honour warehouse.dir in dbproperties
create database does not honour warehouse.dir in dbproperties - Key: HIVE-2032 URL: https://issues.apache.org/jira/browse/HIVE-2032 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.7.0, 0.8.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.8.0 # create database db with dbproperties ('hive.metastore.warehouse.dir' = 'loc'); The above command does not set location of 'db' to 'loc'. It instead creates 'db.db' under the warehouse directory configured in hive-site.xml of CLI. Looks conflicting with HIVE-1820's expectation. If scratch dir is specified here, that is honoured. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2033) A database's warehouse.dir is not used for tables created in it.
A database's warehouse.dir is not used for tables created in it. Key: HIVE-2033 URL: https://issues.apache.org/jira/browse/HIVE-2033 Project: Hive Issue Type: Bug Components: Clients, Metastore Affects Versions: 0.7.0, 0.8.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.8.0 $ create database db with dbproperties ('hive.metastore.warehouse.dir' = 'loc'); $ use db; $ create table test(name string); Table 'test's location is not under 'loc'. Instead its under hive-site.xml's warehouse dir. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2033) A database's warehouse.dir is not used for tables created in it.
[ https://issues.apache.org/jira/browse/HIVE-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-2033: --- Attachment: HIVE-2033_prelim.patch Preliminary patch. Test cases being added. A database's warehouse.dir is not used for tables created in it. Key: HIVE-2033 URL: https://issues.apache.org/jira/browse/HIVE-2033 Project: Hive Issue Type: Bug Components: Clients, Metastore Affects Versions: 0.7.0, 0.8.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.8.0 Attachments: HIVE-2033_prelim.patch $ create database db with dbproperties ('hive.metastore.warehouse.dir' = 'loc'); $ use db; $ create table test(name string); Table 'test's location is not under 'loc'. Instead its under hive-site.xml's warehouse dir. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004148#comment-13004148 ] Tim Perkins commented on HIVE-1555: --- hey... you need to get off this email address. I don't know who on your team is improperly claiming this address as their own, but they're mistaken. Please remove this address from your system. JDBC Storage Handler Key: HIVE-1555 URL: https://issues.apache.org/jira/browse/HIVE-1555 Project: Hive Issue Type: New Feature Components: JDBC Reporter: Bob Robertson Assignee: Andrew Wilson Original Estimate: 24h Remaining Estimate: 24h With the Cassandra and HBase Storage Handlers I thought it would make sense to include a generic JDBC RDBMS Storage Handler so that you could import a standard DB table into Hive. Many people must want to perform HiveQL joins, etc against tables in other systems etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #31
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/31/ -- [...truncated 26800 lines...] [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103081145_409011186.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-08_11-45-08_932_3545880278145320390/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] 2011-03-08 11:45:11,974 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-08_11-45-08_932_3545880278145320390/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103081145_648816479.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-08_11-45-13_557_6849149356289617543/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-08_11-45-13_557_6849149356289617543/-mr-1 [junit] OK [junit] PREHOOK: query: drop table
[jira] Commented: (HIVE-1991) Hive Shell to output number of mappers and number of reducers
[ https://issues.apache.org/jira/browse/HIVE-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004226#comment-13004226 ] Siying Dong commented on HIVE-1991: --- This change was overriden by HIVE-1950. Hive Shell to output number of mappers and number of reducers - Key: HIVE-1991 URL: https://issues.apache.org/jira/browse/HIVE-1991 Project: Hive Issue Type: Improvement Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Trivial Fix For: 0.8.0 Attachments: HIVE-1991.1.patch, HIVE-1991.2.patch Number of mappers and number of reducers are nice information to be outputted for users to know. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2034) Backport HIVE-1991 after overridden by HIVE-1950
[ https://issues.apache.org/jira/browse/HIVE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2034: -- Attachment: HIVE-2034.1.patch Backport HIVE-1991 after overridden by HIVE-1950 Key: HIVE-2034 URL: https://issues.apache.org/jira/browse/HIVE-2034 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Priority: Trivial Attachments: HIVE-2034.1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2034) Backport HIVE-1991 after overridden by HIVE-1950
[ https://issues.apache.org/jira/browse/HIVE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2034: -- Status: Patch Available (was: Open) Backport HIVE-1991 after overridden by HIVE-1950 Key: HIVE-2034 URL: https://issues.apache.org/jira/browse/HIVE-2034 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Priority: Trivial Attachments: HIVE-2034.1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004253#comment-13004253 ] He Yongqiang commented on HIVE-2030: The ContentSummary is not guaranteed to be populated. Even it is, it seems this information is not passed to the child process. (So this is not empty only when executing with local mode) isEmptyPath() to use ContentSummary cache - Key: HIVE-2030 URL: https://issues.apache.org/jira/browse/HIVE-2030 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2030.1.patch addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-2030: --- Status: Open (was: Patch Available) isEmptyPath() to use ContentSummary cache - Key: HIVE-2030 URL: https://issues.apache.org/jira/browse/HIVE-2030 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2030.1.patch addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
hooks in metastore functions
Hi all, I have a requirement that every time some change on metastore takes place, we have some logic which needs to be run. For example, if a new table is getting created in metastore I want to send a message to a message bus. Easiest way for this to work is to add the logic in createTable(). Control it by a hiveConf param and turn it off by default. Alternative way is via hooks. Have this extra logic in hook and then load and fire the hook if its available. Does anyone has an opinion which of these two is preferable. Second one requires new hook loading and execution logic. I am currently interested in four functions: createTable() dropTable() addPartition() dropPartition(). Current, HiveMetaHook which exists in createTable() doesn't perfectly fit the bill, since it is fired only when user expresses it in his create table statement (i.e., if he has specified a storage handler) Instead I want to have this logic always run. If it is unclear, let me know, I can post the code which can demonstrate my usecase. Ashutosh
[jira] Commented: (HIVE-2034) Backport HIVE-1991 after overridden by HIVE-1950
[ https://issues.apache.org/jira/browse/HIVE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004257#comment-13004257 ] Ning Zhang commented on HIVE-2034: -- +1. Will commit if tests pass. Backport HIVE-1991 after overridden by HIVE-1950 Key: HIVE-2034 URL: https://issues.apache.org/jira/browse/HIVE-2034 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Priority: Trivial Attachments: HIVE-2034.1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2030: -- Attachment: HIVE-2030.2.patch In the case of Exception, we don't populate cache. It's to make sure cache never gets wrong value. isEmptyPath() to use ContentSummary cache - Key: HIVE-2030 URL: https://issues.apache.org/jira/browse/HIVE-2030 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004260#comment-13004260 ] Siying Dong commented on HIVE-2030: --- Yongqiang, I don't quite understand your comment. If there is a cache miss, we call the original method. We never make things worse. isEmptyPath() to use ContentSummary cache - Key: HIVE-2030 URL: https://issues.apache.org/jira/browse/HIVE-2030 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1803: - Attachment: JavaEWAH_20110304.zip Uploading a .zip of the source for reference. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-1803: Implement bitmap indexing in Hive (new review starting from patch 6)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/481/ --- Review request for hive. Summary --- Review board was giving me grief trying to update the old patch, so I'm creating a fresh review request for HIVE-1803.6 This addresses bug HIVE-1803. https://issues.apache.org/jira/browse/HIVE-1803 Diffs - lib/README 1c2f0b1 lib/javaewah-0.2.jar PRE-CREATION ql/build.xml 50c604e ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ba222f3 ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java ff74f08 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeWork.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectInput.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectOutput.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1f01446 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java 6c320c5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexResult.java 0c9ccea ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeTask.java eac168f ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeWork.java 26beb4e ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 391e5de ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 77220a1 ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 30714b8 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapOp.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap1.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap2.q PRE-CREATION ql/src/test/queries/clientpositive/index_bitmap3.q PRE-CREATION ql/src/test/queries/clientpositive/index_compact.q 6547a52 ql/src/test/queries/clientpositive/index_compact_1.q 6d59353 ql/src/test/queries/clientpositive/index_compact_2.q 358b5e9 ql/src/test/queries/clientpositive/index_compact_3.q ee8abda ql/src/test/queries/clientpositive/udf_bitmap_and.q PRE-CREATION ql/src/test/queries/clientpositive/udf_bitmap_or.q PRE-CREATION ql/src/test/results/clientpositive/index_bitmap.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap1.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap2.q.out PRE-CREATION ql/src/test/results/clientpositive/index_bitmap3.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_bitmap_and.q.out PRE-CREATION ql/src/test/results/clientpositive/udf_bitmap_or.q.out PRE-CREATION Diff: https://reviews.apache.org/r/481/diff Testing --- Thanks, John
[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1803: - Status: Open (was: Patch Available) New review board entry (I failed trying to update the old one with the new patch): https://reviews.apache.org/r/481/ Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004317#comment-13004317 ] He Yongqiang commented on HIVE-2030: okay, will test and commit. isEmptyPath() to use ContentSummary cache - Key: HIVE-2030 URL: https://issues.apache.org/jira/browse/HIVE-2030 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Melick updated HIVE-1644: - Attachment: HIVE-1644.7.patch HIVE-1744.7.patch: @Yongqiang, I fixed the problems in GenMRTableScal1.java, and think I have dealt with most of your comments. I'm confused about what you mean with the combinehiveinputformat. @John, I made a first attempt at factoring SemanticAnalyzer calls into the ParseContext, but would appreciate your input. This patch will also fail the unit test index_opt_where_simple.q as it stands. However, if you remove the lines that attempt to use manual indexing, it succeeds. The test that succeeds looks like {code:sql} CREATE INDEX src_index ON TABLE src(key) as 'COMPACT' WITH DEFERRED REBUILD; ALTER INDEX src_index ON src REBUILD; SET hive.optimize.autoindex=true; EXPLAIN SELECT key, value FROM src WHERE key=86 ORDER BY key; SELECT key, value FROM src WHERE key=86 ORDER BY key; DROP INDEX src_index on src; {code} It appears as if our regular expression that identifies WHERE clauses by looking for FIL operators (filters) may not be specific enough. I think the remaining errors might be caused by trying to generate index queries for both the {{{SELECT ... FROM src}}} (as desired), and the {{{SELECT ... FROM default__src_src_index__}}} that we generated, which is a problem. use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2037) Merge result file size should honor hive.merge.size.per.task
Merge result file size should honor hive.merge.size.per.task Key: HIVE-2037 URL: https://issues.apache.org/jira/browse/HIVE-2037 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2037.patch The merge job set mapred.min.split.size to the value of hive.merge.size.per.task, which roughly equals to the output file size. However the input split size is also determined by mapred.min.split.size.per.node, mapred.min.split.size.per.rack, and mapred.max.split.size. They should be set the same as hive.merge.size.per.task as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2037) Merge result file size should honor hive.merge.size.per.task
[ https://issues.apache.org/jira/browse/HIVE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2037: - Status: Patch Available (was: Open) Merge result file size should honor hive.merge.size.per.task Key: HIVE-2037 URL: https://issues.apache.org/jira/browse/HIVE-2037 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2037.patch The merge job set mapred.min.split.size to the value of hive.merge.size.per.task, which roughly equals to the output file size. However the input split size is also determined by mapred.min.split.size.per.node, mapred.min.split.size.per.rack, and mapred.max.split.size. They should be set the same as hive.merge.size.per.task as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2037) Merge result file size should honor hive.merge.size.per.task
[ https://issues.apache.org/jira/browse/HIVE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2037: - Attachment: HIVE-2037.patch Merge result file size should honor hive.merge.size.per.task Key: HIVE-2037 URL: https://issues.apache.org/jira/browse/HIVE-2037 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2037.patch The merge job set mapred.min.split.size to the value of hive.merge.size.per.task, which roughly equals to the output file size. However the input split size is also determined by mapred.min.split.size.per.node, mapred.min.split.size.per.rack, and mapred.max.split.size. They should be set the same as hive.merge.size.per.task as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira