Query regarding Metastore(Derby) SDS table data
Hi all, While creating hive table using create table command, the code flow will also insert the HDFS location in SDS table present in Derby For Example on executing:- create table sample(rate int) stored as textfile; The SDS table (meta table present in Derby) contains following entry corresponding to Hive table sample like http://{IP}:9001/user/hive/warehouse/sample say 9001 is the port configured, and {HDFS-URL}/user/hive/warehouse is the warehouse configured. Now here I'm interested to know the intention of maintaining full path of the HDFS location in SDS Table, say why not a relative path, anyhow Hive is capable of constructing full HDFS URL Path. Any design inputs on it, will surely help. Thanks, Chinna Rao Lalam *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
[jira] Updated: (HIVE-1716) make TestHBaseCliDriver use dynamic ports to avoid conflicts with already-running services
[ https://issues.apache.org/jira/browse/HIVE-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1716: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks John make TestHBaseCliDriver use dynamic ports to avoid conflicts with already-running services -- Key: HIVE-1716 URL: https://issues.apache.org/jira/browse/HIVE-1716 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1716.1.patch ant test -Dhadoop.version=0.20.0 -Dtestcase=TestHBaseCliDriver: [junit] org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region [junit] at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:976) [junit] at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625) [junit] at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:607) [junit] at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:738) [junit] at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634) [junit] at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601) [junit] at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128) [junit] at org.apache.hadoop.hive.hbase.HBaseTestSetup.setUpFixtures(HBaseTestSetup.java:87) [junit] at org.apache.hadoop.hive.hbase.HBaseTestSetup.preTest(HBaseTestSetup.java:59) [junit] at org.apache.hadoop.hive.hbase.HBaseQTestUtil.init(HBaseQTestUtil.java:31) [junit] at org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:43) [junit] at junit.framework.TestCase.runBare(TestCase.java:125) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-1950) Block merge for RCFile
Block merge for RCFile -- Key: HIVE-1950 URL: https://issues.apache.org/jira/browse/HIVE-1950 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: He Yongqiang In our env, there are a lot of small files inside one partition/table. In order to reduce the namenode load, we have one dedicated housekeeping job running to merge these file. Right now the merge is an 'insert overwrite' in hive, and requires decompress the data and compress it. This jira is to add a command in Hive to do the merge without decompress and recompress the data. Something like alter table tbl_name [partition ()] merge files. In this jira the new command will only support RCFile, since there need some new APIs to the fileformat. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Hudson: Hive-trunk-h0.20 #530
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/530/ -- [...truncated 22570 lines...] [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH
Re: Query regarding Metastore(Derby) SDS table data
Different partitions can have different paths - A partition's path need not be a sub-direcotry of the table's path. In facebook, we use this regularly, specially for external tables. So, it simplifies things if the full path is stored for the partition in the metastore. Thanks, -namit On 2/3/11 2:22 AM, Chinna chinna...@huawei.com wrote: Hi all, While creating hive table using create table command, the code flow will also insert the HDFS location in SDS table present in Derby For Example on executing:- create table sample(rate int) stored as textfile; The SDS table (meta table present in Derby) contains following entry corresponding to Hive table sample like http://{IP}:9001/user/hive/warehouse/sample say 9001 is the port configured, and {HDFS-URL}/user/hive/warehouse is the warehouse configured. Now here I'm interested to know the intention of maintaining full path of the HDFS location in SDS Table, say why not a relative path, anyhow Hive is capable of constructing full HDFS URL Path. Any design inputs on it, will surely help. Thanks, Chinna Rao Lalam ** ** *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
[jira] Updated: (HIVE-1952) fix some outputs and make some tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1952: - Attachment: hive.1952.1.patch fix some outputs and make some tests deterministic -- Key: HIVE-1952 URL: https://issues.apache.org/jira/browse/HIVE-1952 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1952.1.patch Some of the tests are un-deterministic, and are causing intermediate diffs -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1952) fix some outputs and make some tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1952: - Status: Patch Available (was: Open) fix some outputs and make some tests deterministic -- Key: HIVE-1952 URL: https://issues.apache.org/jira/browse/HIVE-1952 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1952.1.patch Some of the tests are un-deterministic, and are causing intermediate diffs -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-1952) fix some outputs and make some tests deterministic
fix some outputs and make some tests deterministic -- Key: HIVE-1952 URL: https://issues.apache.org/jira/browse/HIVE-1952 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1952.1.patch Some of the tests are un-deterministic, and are causing intermediate diffs -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1952) fix some outputs and make some tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990256#comment-12990256 ] He Yongqiang commented on HIVE-1952: +1, running tests. fix some outputs and make some tests deterministic -- Key: HIVE-1952 URL: https://issues.apache.org/jira/browse/HIVE-1952 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1952.1.patch Some of the tests are un-deterministic, and are causing intermediate diffs -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1939) Fix test failure in TestContribCliDriver/url_hook.q
[ https://issues.apache.org/jira/browse/HIVE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1939: - Fix Version/s: 0.7.0 Fix test failure in TestContribCliDriver/url_hook.q --- Key: HIVE-1939 URL: https://issues.apache.org/jira/browse/HIVE-1939 Project: Hive Issue Type: Bug Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1939) Fix test failure in TestContribCliDriver/url_hook.q
[ https://issues.apache.org/jira/browse/HIVE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990293#comment-12990293 ] John Sichi commented on HIVE-1939: -- I did some bisection on svn commits and found that the commit for HIVE-1636 seems to be the point where this broke. http://svn.apache.org/viewvc?view=revrev=1063549 Fix test failure in TestContribCliDriver/url_hook.q --- Key: HIVE-1939 URL: https://issues.apache.org/jira/browse/HIVE-1939 Project: Hive Issue Type: Bug Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive queries consuming 100% cpu
Hi, The simplest of hive queries seem to be consuming 100% cpu. This is with a small 4-node cluster. The machines are pretty beefy (16 cores per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for mapred.child.java.opts, etc). A simple query like select count(1) from events where the events table has daily partitions of log files in gzipped file format). While this is probably too generic a question and there is a bunch of investigation we need to, are there any specific areas for me to look at? Has anyone see anything like this before? Also, are there any tools or easy options to profile hive query execution? Thanks in advance, Vijay
[jira] Updated: (HIVE-1951) input16_cc.q is failing in testminimrclidriver
[ https://issues.apache.org/jira/browse/HIVE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1951: --- Attachment: HIVE-1951.1.patch changing the test file for a quick fix. Will open a new jira for the real problem. The problem here is that hive should process all comments in CliDriver. Hive comment can be followd by any other commands (not just hive query command). input16_cc.q is failing in testminimrclidriver -- Key: HIVE-1951 URL: https://issues.apache.org/jira/browse/HIVE-1951 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: He Yongqiang Attachments: HIVE-1951.1.patch -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1951) input16_cc.q is failing in testminimrclidriver
[ https://issues.apache.org/jira/browse/HIVE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990307#comment-12990307 ] He Yongqiang commented on HIVE-1951: opened jira https://issues.apache.org/jira/browse/HIVE-1953 for the real problem. input16_cc.q is failing in testminimrclidriver -- Key: HIVE-1951 URL: https://issues.apache.org/jira/browse/HIVE-1951 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: He Yongqiang Attachments: HIVE-1951.1.patch -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1939) Fix test failure in TestContribCliDriver/url_hook.q
[ https://issues.apache.org/jira/browse/HIVE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990309#comment-12990309 ] Carl Steinbach commented on HIVE-1939: -- @John: Yup, you're right. The problem is that HIVE-1636 modified 'SHOW TABLES IN db' to throw an error when db doesn't exist. Previously in this situation the SHOW TABLES command just returned an empty result set. url_hook.q points the MetaStore to a new JDO URL and then runs 'SHOW TABLES'. In the past this caused the metastore to initialize a new metastore schema, but without creating the 'default' database. Since 'SHOW TABLES' wasn't checking for the existence of the default database the command succeeded with an empty result set. I think the correct fix for this problem is to make sure that the metastore creates the 'default' table if it does not already exist. Fix test failure in TestContribCliDriver/url_hook.q --- Key: HIVE-1939 URL: https://issues.apache.org/jira/browse/HIVE-1939 Project: Hive Issue Type: Bug Components: Metastore Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1922) semantic analysis error, when using group by and order by together
[ https://issues.apache.org/jira/browse/HIVE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1922: - Priority: Critical (was: Blocker) semantic analysis error, when using group by and order by together -- Key: HIVE-1922 URL: https://issues.apache.org/jira/browse/HIVE-1922 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0 Environment: Ubuntu Karmic, hadoop 0.20.0, hive 0.7.0 Reporter: Hongwei Priority: Critical Original Estimate: 168h Remaining Estimate: 168h When I tried queries like, 'select t.c from t group by t.c sort by t.c;', hive reported error ,'FAILED: Error in semantic analysis: line 1:40 Invalid Table Alias or Column Reference t'. But 'select t.c from t group by t.c ' or 'select t.c from t sort by t.c;' are ok. 'select t.c from t group by t.c sort by c;' is ok too. The hive server gives stack trace like 11/01/20 03:07:34 INFO parse.SemanticAnalyzer: Get metadata for subqueries 11/01/20 03:07:34 INFO parse.SemanticAnalyzer: Get metadata for destination tables 11/01/20 03:07:34 INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis FAILED: Error in semantic analysis: line 1:40 Invalid Table Alias or Column Reference t 11/01/20 03:07:34 ERROR ql.Driver: FAILED: Error in semantic analysis: line 1:40 Invalid Table Alias or Column Reference t org.apache.hadoop.hive.ql.parse.SemanticException: line 1:40 Invalid Table Alias or Column Reference t at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6743) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:4288) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5446) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6007) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6583) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:343) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:731) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1951) input16_cc.q is failing in testminimrclidriver
[ https://issues.apache.org/jira/browse/HIVE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1951: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Yongqiang input16_cc.q is failing in testminimrclidriver -- Key: HIVE-1951 URL: https://issues.apache.org/jira/browse/HIVE-1951 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: He Yongqiang Attachments: HIVE-1951.1.patch -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HIVE-1559) Contrib tests not run as part of 'ant test'
[ https://issues.apache.org/jira/browse/HIVE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-1559. -- Resolution: Invalid I think this was fixed a while ago as part of some other ticket. I see contrib/build.xml listed in the filelist of of the 'iterate-test' Ant macro. Contrib tests not run as part of 'ant test' --- Key: HIVE-1559 URL: https://issues.apache.org/jira/browse/HIVE-1559 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Namit Jain Copying from https://issues.apache.org/jira/browse/HIVE-1556 BTW, if I run 'ant test' in hive's root directory, it seems the TestContrib* were not tested. Is it expected? TestContribCliDriver should be run as part of 'ant test' -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990331#comment-12990331 ] Anja Gruenheid commented on HIVE-1940: -- I have set up the last stable version, but as far as I understood, some features have been added during the current iteration, which also have had impact on the design of the MetaStore. Is there an up-to-date overview of the MetaStore somewhere or should I retrace the updates that have been made since the last release? If I can collect all the data that I need, I'll create the model. Query Optimization Using Column Metadata and Histograms --- Key: HIVE-1940 URL: https://issues.apache.org/jira/browse/HIVE-1940 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor Reporter: Anja Gruenheid The current basis for cost-based query optimization in Hive is information gathered on tables and partitions. To make further improvements in query optimization possible, the next step is to develop and implement possibilities to gather information on columns as discussed in issue HIVE-33. After that, an implementation of histograms is a possible option to use and collect run-time statistics. Next to the actual implementation of these features, it is also necessary to develop a consistent storage model for the MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-1954) Allow CLI to connect to Hive server and execute commands remotely
Allow CLI to connect to Hive server and execute commands remotely - Key: HIVE-1954 URL: https://issues.apache.org/jira/browse/HIVE-1954 Project: Hive Issue Type: New Feature Components: CLI Reporter: Ning Zhang Assignee: Ning Zhang Currently Hive CLI runs the client side code (compilation and metastore operations etc) in local machine. We should extend CLI to connect to Hive server and execute commands remotely. Benefits include: * client side memory requirement is alleviated. * better security control on Hive server side. * possible use of metastore cache layer in Hive server side, etc. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Hudson: Hive-trunk-h0.20 #531
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/531/changes Changes: [namit] HIVE-1716 Make TestHBaseCliDriver use dynamic ports to avoid conflicts with already-running services (John Sichi via namit) -- [...truncated 22563 lines...] [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK:
Re: Hive queries consuming 100% cpu
Hey Vijay, You can go to the mapred ui, normally it runs on port 50030 of the namenode and see how many map jobs got created for your submitted query. You said that the events table has daily partitions but the example query that you have does not prune the partitions by specifying a WHERE clause. So I have the following questions 1) how big is the table (you can just do a hadoop dfs -dus hdfs-dir-for-table ? how many partitions ? 2) do you really intend to count the number of events across all days ? 3) could you build a query which computes over 1-5 day(s) and persists the data in a separate table for consumption later on ? Based on your node configuration, I am just guessing the amount of data to process is too large and hence the high CPU. Thanks, Viral On Thu, Feb 3, 2011 at 12:49 PM, Vijay tec...@gmail.com wrote: Hi, The simplest of hive queries seem to be consuming 100% cpu. This is with a small 4-node cluster. The machines are pretty beefy (16 cores per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for mapred.child.java.opts, etc). A simple query like select count(1) from events where the events table has daily partitions of log files in gzipped file format). While this is probably too generic a question and there is a bunch of investigation we need to, are there any specific areas for me to look at? Has anyone see anything like this before? Also, are there any tools or easy options to profile hive query execution? Thanks in advance, Vijay
[jira] Commented: (HIVE-1954) Allow CLI to connect to Hive server and execute commands remotely
[ https://issues.apache.org/jira/browse/HIVE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990346#comment-12990346 ] Edward Capriolo commented on HIVE-1954: --- This might be a dupe of https://issues.apache.org/jira/browse/HIVE-818 Allow CLI to connect to Hive server and execute commands remotely - Key: HIVE-1954 URL: https://issues.apache.org/jira/browse/HIVE-1954 Project: Hive Issue Type: New Feature Components: CLI Reporter: Ning Zhang Assignee: Ning Zhang Currently Hive CLI runs the client side code (compilation and metastore operations etc) in local machine. We should extend CLI to connect to Hive server and execute commands remotely. Benefits include: * client side memory requirement is alleviated. * better security control on Hive server side. * possible use of metastore cache layer in Hive server side, etc. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-1955) Support non-constant expressions for array indexes.
Support non-constant expressions for array indexes. --- Key: HIVE-1955 URL: https://issues.apache.org/jira/browse/HIVE-1955 Project: Hive Issue Type: Improvement Reporter: Adam Kramer FAILED: Error in semantic analysis: line 4:8 Non Constant Expressions for Array Indexes not Supported dut ...just wrote my own UDF to do this, and it is trivial. We should support this natively. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1954) Allow CLI to connect to Hive server and execute commands remotely
[ https://issues.apache.org/jira/browse/HIVE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990372#comment-12990372 ] Ning Zhang commented on HIVE-1954: -- Cool. I should have searched the JIRA first. Are you working on this right now? Allow CLI to connect to Hive server and execute commands remotely - Key: HIVE-1954 URL: https://issues.apache.org/jira/browse/HIVE-1954 Project: Hive Issue Type: New Feature Components: CLI Reporter: Ning Zhang Assignee: Ning Zhang Currently Hive CLI runs the client side code (compilation and metastore operations etc) in local machine. We should extend CLI to connect to Hive server and execute commands remotely. Benefits include: * client side memory requirement is alleviated. * better security control on Hive server side. * possible use of metastore cache layer in Hive server side, etc. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Hive queries consuming 100% cpu
Sorry i should've given more details. The query was limited by a partition range; I just omitted the WHERE clause in the mail. The table is not that big. For each day, there is one gzipped file. The largest file is about 250MB (close to 2GB uncompressed). I did intend to count and that was just to test since I wanted to run a query that did the most minimal logic/processing. Here's a test I ran now. The query is getting count(1) for 8 days. It spawned 8 maps as expected. The maps run for anywhere between 42 to 69 seconds (which may or may not be right; I need to check that). It spawned only one reduce task. The reducer ran for 117 seconds, which seems long for this query. On Thu, Feb 3, 2011 at 2:31 PM, Viral Bajaria viral.baja...@gmail.com wrote: Hey Vijay, You can go to the mapred ui, normally it runs on port 50030 of the namenode and see how many map jobs got created for your submitted query. You said that the events table has daily partitions but the example query that you have does not prune the partitions by specifying a WHERE clause. So I have the following questions 1) how big is the table (you can just do a hadoop dfs -dus hdfs-dir-for-table ? how many partitions ? 2) do you really intend to count the number of events across all days ? 3) could you build a query which computes over 1-5 day(s) and persists the data in a separate table for consumption later on ? Based on your node configuration, I am just guessing the amount of data to process is too large and hence the high CPU. Thanks, Viral On Thu, Feb 3, 2011 at 12:49 PM, Vijay tec...@gmail.com wrote: Hi, The simplest of hive queries seem to be consuming 100% cpu. This is with a small 4-node cluster. The machines are pretty beefy (16 cores per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for mapred.child.java.opts, etc). A simple query like select count(1) from events where the events table has daily partitions of log files in gzipped file format). While this is probably too generic a question and there is a bunch of investigation we need to, are there any specific areas for me to look at? Has anyone see anything like this before? Also, are there any tools or easy options to profile hive query execution? Thanks in advance, Vijay
[jira] Updated: (HIVE-1952) fix some outputs and make some tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1952: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed! Thanks Namit! fix some outputs and make some tests deterministic -- Key: HIVE-1952 URL: https://issues.apache.org/jira/browse/HIVE-1952 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1952.1.patch Some of the tests are un-deterministic, and are causing intermediate diffs -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1950) Block merge for RCFile
[ https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1950: --- Attachment: HIVE-1950.1.patch A patch for review. The code now is kind of very clean. Comments about how to make it clean are welcome! Block merge for RCFile -- Key: HIVE-1950 URL: https://issues.apache.org/jira/browse/HIVE-1950 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1950.1.patch In our env, there are a lot of small files inside one partition/table. In order to reduce the namenode load, we have one dedicated housekeeping job running to merge these file. Right now the merge is an 'insert overwrite' in hive, and requires decompress the data and compress it. This jira is to add a command in Hive to do the merge without decompress and recompress the data. Something like alter table tbl_name [partition ()] merge files. In this jira the new command will only support RCFile, since there need some new APIs to the fileformat. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1950) Block merge for RCFile
[ https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990389#comment-12990389 ] He Yongqiang commented on HIVE-1950: review board: https://reviews.apache.org/r/388/ Block merge for RCFile -- Key: HIVE-1950 URL: https://issues.apache.org/jira/browse/HIVE-1950 Project: Hive Issue Type: New Feature Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1950.1.patch In our env, there are a lot of small files inside one partition/table. In order to reduce the namenode load, we have one dedicated housekeeping job running to merge these file. Right now the merge is an 'insert overwrite' in hive, and requires decompress the data and compress it. This jira is to add a command in Hive to do the merge without decompress and recompress the data. Something like alter table tbl_name [partition ()] merge files. In this jira the new command will only support RCFile, since there need some new APIs to the fileformat. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-1950
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/388/ --- Review request for hive. Summary --- early review This addresses bug HIVE-1950. https://issues.apache.org/jira/browse/HIVE-1950 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHook.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Throttle.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeInputFormat.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeOutputFormat.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileValueBufferWrapper.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTablePartMergeFilesDesc.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1067036 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1067036 trunk/ql/src/test/queries/clientpositive/alter_merge.q PRE-CREATION trunk/ql/src/test/results/clientpositive/alter_merge.q.out PRE-CREATION trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 1067036 trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 1067036 trunk/shims/src/common/java/org/apache/hadoop/hive/shims/CombineHiveKey.java PRE-CREATION Diff: https://reviews.apache.org/r/388/diff Testing --- Thanks, Yongqiang
[jira] Created: (HIVE-1956) Provide DFS initialization script for Hive
Provide DFS initialization script for Hive --- Key: HIVE-1956 URL: https://issues.apache.org/jira/browse/HIVE-1956 Project: Hive Issue Type: Improvement Components: Configuration, Server Infrastructure Affects Versions: 0.7.0 Reporter: Bruno Mahé Priority: Trivial Fix For: 0.7.0 Attachments: HIVE-1956.patch This script automates the creation of the Hive warehouse and scratch directories on DFS -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1956) Provide DFS initialization script for Hive
[ https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Mahé updated HIVE-1956: - Attachment: HIVE-1956.patch Provide DFS initialization script for Hive --- Key: HIVE-1956 URL: https://issues.apache.org/jira/browse/HIVE-1956 Project: Hive Issue Type: Improvement Components: Configuration, Server Infrastructure Affects Versions: 0.7.0 Reporter: Bruno Mahé Priority: Trivial Fix For: 0.7.0 Attachments: HIVE-1956.patch This script automates the creation of the Hive warehouse and scratch directories on DFS -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1956) Provide DFS initialization script for Hive
[ https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990395#comment-12990395 ] Bruno Mahé commented on HIVE-1956: -- Review request: https://reviews.apache.org/r/389/ Provide DFS initialization script for Hive --- Key: HIVE-1956 URL: https://issues.apache.org/jira/browse/HIVE-1956 Project: Hive Issue Type: Improvement Components: Configuration, Server Infrastructure Affects Versions: 0.7.0 Reporter: Bruno Mahé Priority: Trivial Fix For: 0.7.0 Attachments: HIVE-1956.patch This script automates the creation of the Hive warehouse and scratch directories on DFS -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-1941: support explicit view partitioning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/390/ --- Review request for hive. Summary --- review request from JVS This addresses bug HIVE-1941. https://issues.apache.org/jira/browse/HIVE-1941 Diffs - http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure2.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure3.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure4.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/analyze_view.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure6.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure7.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure8.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure9.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/create_view_partitioned.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure.q.out 1067043 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure2.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure3.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure4.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/analyze_view.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure6.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure7.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure8.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_view_failure9.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/create_view_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/390/diff Testing --- Thanks, John
[jira] Updated: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1941: - Status: Patch Available (was: Open) https://reviews.apache.org/r/390/ support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1940) Query Optimization Using Column Metadata and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990411#comment-12990411 ] John Sichi commented on HIVE-1940: -- If you just svn update to the tip of trunk and build/install from there, you'll get the latest metastore. Substantial additions since 0.6 include support for indexes, authorization, and various database properties. Query Optimization Using Column Metadata and Histograms --- Key: HIVE-1940 URL: https://issues.apache.org/jira/browse/HIVE-1940 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor Reporter: Anja Gruenheid The current basis for cost-based query optimization in Hive is information gathered on tables and partitions. To make further improvements in query optimization possible, the next step is to develop and implement possibilities to gather information on columns as discussed in issue HIVE-33. After that, an implementation of histograms is a possible option to use and collect run-time statistics. Next to the actual implementation of these features, it is also necessary to develop a consistent storage model for the MetaStore. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1956) Provide DFS initialization script for Hive
[ https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Mahé updated HIVE-1956: - Status: Patch Available (was: Open) Provide DFS initialization script for Hive --- Key: HIVE-1956 URL: https://issues.apache.org/jira/browse/HIVE-1956 Project: Hive Issue Type: Improvement Components: Configuration, Server Infrastructure Affects Versions: 0.7.0 Reporter: Bruno Mahé Priority: Trivial Fix For: 0.7.0 Attachments: HIVE-1956.patch This script automates the creation of the Hive warehouse and scratch directories on DFS -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/392/ --- Review request for hive. Summary --- Preliminary review. This addresses bug HIVE-1694. https://issues.apache.org/jira/browse/HIVE-1694 Diffs - http://svn.apache.org/repos/asf/hive/trunk/build.xml 1067048 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1067048 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1067048 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1067048 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1067048 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyCtx.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteCanApplyProcFactory.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteGBUsingIndex.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryCtx.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteIndexSubqueryProcFactory.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteParseContextGenerator.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyCtx.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/RewriteRemoveGroupbyProcFactory.java PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1067048 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1067048 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1067048 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/fatal.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION Diff: https://reviews.apache.org/r/392/diff Testing --- Thanks, John
[jira] Updated: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1694: - Description: The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. was: The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. Summary: Accelerate GROUP BY execution using indexes (was: Accelerate query execution using indexes) Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Hudson: Hive-trunk-h0.20 #532
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/532/changes Changes: [namit] HIVE-1951 input16_cc.q is failing in testminimrclidriver (He Yongqiang via namit) -- [...truncated 22598 lines...] [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1 [junit] OK
[jira] Commented: (HIVE-1956) Provide DFS initialization script for Hive
[ https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990429#comment-12990429 ] Namit Jain commented on HIVE-1956: -- +1 Provide DFS initialization script for Hive --- Key: HIVE-1956 URL: https://issues.apache.org/jira/browse/HIVE-1956 Project: Hive Issue Type: Improvement Components: Configuration, Server Infrastructure Affects Versions: 0.7.0 Reporter: Bruno Mahé Priority: Trivial Fix For: 0.7.0 Attachments: HIVE-1956.patch This script automates the creation of the Hive warehouse and scratch directories on DFS -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Hudson: Hive-trunk-h0.20 #533
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/533/changes Changes: [heyongqiang] HIVE-1952. fix some outputs and make some tests deterministic (namit via He Yongqiang) -- [...truncated 21915 lines...] [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1
[jira] Updated: (HIVE-1956) Provide DFS initialization script for Hive
[ https://issues.apache.org/jira/browse/HIVE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1956: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Bruno Provide DFS initialization script for Hive --- Key: HIVE-1956 URL: https://issues.apache.org/jira/browse/HIVE-1956 Project: Hive Issue Type: Improvement Components: Configuration, Server Infrastructure Affects Versions: 0.7.0 Reporter: Bruno Mahé Priority: Trivial Fix For: 0.7.0 Attachments: HIVE-1956.patch This script automates the creation of the Hive warehouse and scratch directories on DFS -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990466#comment-12990466 ] Prajakta Kalmegh commented on HIVE-1694: Thanks John. We will ensure that henceforth. Accelerate GROUP BY execution using indexes --- Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1948) Have audit logging in the Metastore
[ https://issues.apache.org/jira/browse/HIVE-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HIVE-1948: -- Attachment: audit-log.1.patch A slightly updated patch. Have audit logging in the Metastore --- Key: HIVE-1948 URL: https://issues.apache.org/jira/browse/HIVE-1948 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.7.0 Attachments: audit-log.1.patch, audit-log.patch It would be good to have audit logging in the metastore, similar to Hadoop's NameNode audit logging. This would allow administrators to dig into details about which user performed metadata operations (like create/drop tables/partitions) and from where (IP address). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira