Review Request: HIVE-2084. Upgrade datanucleus from 2.0.3 to 2.2.3
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/537/ --- Review request for hive. Summary --- upgrading datanucleus to the latest stable version Diffs - trunk/ivy/libraries.properties 1087196 trunk/metastore/ivy.xml 1087196 trunk/metastore/src/model/package.jdo 1087196 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1087196 Diff: https://reviews.apache.org/r/537/diff Testing --- Testing using the script provided by HIVE-1862 while adding partitions to the input table. Test has been going on for several hours and so far so good. Thanks, Ning
[jira] [Updated] (HIVE-2084) Upgrade datanucleus to from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2084: - Attachment: HIVE-2084.patch review board: https://reviews.apache.org/r/537/ Testing using the script provided by HIVE-1862 while adding partitions to the input table. Test has been going on for several hours and so far so good. Upgrade datanucleus to from 2.0.3 to 2.2.3 -- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus to from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013852#comment-13013852 ] Ning Zhang commented on HIVE-2084: -- Devaraj/Mac, could you also test this patch on your production environment if you have other testing scripts? I'm testing the script uploaded by Namit in HIVE-1862 and this patch seems working fine so far. Upgrade datanucleus to from 2.0.3 to 2.2.3 -- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2084: - Status: Patch Available (was: Open) Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1872) Hive process is exiting on executing ALTER query
[ https://issues.apache.org/jira/browse/HIVE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013960#comment-13013960 ] Chinna Rao Lalam commented on HIVE-1872: Here i have few doubts. Just returning 9 will solve the problem of exiting jvm, but all the tasks (Mapred task and Non mapred task) that are running will still continue to run if main thread is not returning (In case of HiveServer), there by wasting the cluster resources. So the problem here is not only exiting the JVM but also the running tasks. Any suggestions on this. Hive process is exiting on executing ALTER query Key: HIVE-1872 URL: https://issues.apache.org/jira/browse/HIVE-1872 Project: Hive Issue Type: Bug Components: CLI, Server Infrastructure Affects Versions: 0.6.0 Environment: SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (3) Hadoop 0.20.1 Hive 0.6.0 Reporter: Bharath R Assignee: Bharath R Attachments: HIVE-1872.1.patch Hive process is exiting on executing the below queries in the same order as mentioned 1) CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' 2) ALTER TABLE SAMPLETABLE add Partition(ds='sf') location '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse' After the second query execution , the hive throws the below exception and exiting the process 10:09:03 ERROR exec.DDLTask: FAILED: Error in metadata: table is partitioned but partition spec is not specified or tab: {ipz=100} org.apache.hadoop.hive.ql.metadata.HiveException: table is partitioned but partition spec is not specified or tab: {ipz=100} at org.apache.hadoop.hive.ql.metadata.Table.isValidSpec(Table.java:341) at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:902) at org.apache.hadoop.hive.ql.exec.DDLTask.addPartition(DDLTask.java:282) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:191) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:114) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) As the alter query is incorrect the exception was thrown, ideally it should be ALTER TABLE SAMPLETABLE add Partition(ds='sf',ipz=100) location '/user/hive/warehouse'. It is not good to exit the HIVE process when the query is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Few code improvements in the ql and serde packages.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/535/ --- Review request for hive. Summary --- Few code improvements in the ql and serde packages. 1) Little performance Improvements 2) Null checks to avoid NPEs 3) Effective varaible management. This addresses bug HIVE-2080. https://issues.apache.org/jira/browse/HIVE-2080 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FilterOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UnionOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ASTNode.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1086514 trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeField.java 1086514 trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFieldType.java 1086514 trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFunction.java 1086514 Diff: https://reviews.apache.org/r/535/diff Testing --- Ran the tests. All tests passed Thanks, chinna
[jira] [Commented] (HIVE-2080) Few code improvements in the ql and serde packages.
[ https://issues.apache.org/jira/browse/HIVE-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014005#comment-13014005 ] jirapos...@reviews.apache.org commented on HIVE-2080: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/535/ --- Review request for hive. Summary --- Few code improvements in the ql and serde packages. 1) Little performance Improvements 2) Null checks to avoid NPEs 3) Effective varaible management. This addresses bug HIVE-2080. https://issues.apache.org/jira/browse/HIVE-2080 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FilterOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UnionOperator.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ASTNode.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086514 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1086514 trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeField.java 1086514 trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFieldType.java 1086514 trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFunction.java 1086514 Diff: https://reviews.apache.org/r/535/diff Testing --- Ran the tests. All tests passed Thanks, chinna Few code improvements in the ql and serde packages. --- Key: HIVE-2080 URL: https://issues.apache.org/jira/browse/HIVE-2080 Project: Hive Issue Type: Bug Components: Query Processor, Serializers/Deserializers Affects Versions: 0.7.0 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2080.Patch Few code improvements in the ql and serde packages. 1) Little performance Improvements 2) Null checks to avoid NPEs 3) Effective varaible management. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #60
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/60/ -- Started by timer Building remotely on ubuntu1 Cleaning workspace https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/ Checking out http://svn.apache.org/repos/asf/hive/branches/branch-0.7 ERROR: Failed to check out http://svn.apache.org/repos/asf/hive/branches/branch-0.7 org.tmatesoft.svn.core.SVNException: svn: connection refused by the server svn: OPTIONS request failed on '/repos/asf/hive/branches/branch-0.7' at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:106) at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:90) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:629) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:275) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:263) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:516) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:98) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1001) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:178) at org.tmatesoft.svn.core.wc.SVNBasicClient.getRevisionNumber(SVNBasicClient.java:482) at org.tmatesoft.svn.core.wc.SVNBasicClient.getLocations(SVNBasicClient.java:873) at org.tmatesoft.svn.core.wc.SVNBasicClient.createRepository(SVNBasicClient.java:534) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doCheckout(SVNUpdateClient.java:901) at hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:83) at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:137) at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:725) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:706) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:690) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:1944) at hudson.remoting.UserRequest.perform(UserRequest.java:114) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:270) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57) ... 1 more Recording test results
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014065#comment-13014065 ] Krishna Kumar commented on HIVE-2065: - The RCFile layout seems to have been designed initially to be compatible to SequenceFile but over a period of time (esp. due to key compression enhancement?), it seems to have drifted away. The compatibility intent goes as far as to have boolean values always false etc (blockCompression), but couple of bugs have been introduced later whereby the recordlength is no longer the ondisk record length, and the keylength field is no longer the ondisk key length. Once I started writing a unit test for ensuring that the rcfile layout does stay in sync with sequence file layout, I also found that the classes designated as the keyclass/valueclass are no longer able to read themselves in or write themselves out, even if properly 'primed'. That is the primary aim of the changes due to #3. [PS. The reason I am looking into this now is experiment with column-specific compression ('use this codec for this sorted, numeric column') or type-specific compression ('use this codec for all enumerations types of this table'). Presumably, if successful, this information will be put into metadata as I am doing with the generic codec in the changes above.] RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014090#comment-13014090 ] Mac Yang commented on HIVE-2084: Ning, I will test this patch with our usual set up and let you know how it goes. Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1872) Hive process is exiting on executing ALTER query
[ https://issues.apache.org/jira/browse/HIVE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014096#comment-13014096 ] John Sichi commented on HIVE-1872: -- Good point about Hive server. As a short-term solution to flaky tests, adding a config param that is only enabled for tests should still work fine, but I agree that it would be best to address the bigger issues. Hive process is exiting on executing ALTER query Key: HIVE-1872 URL: https://issues.apache.org/jira/browse/HIVE-1872 Project: Hive Issue Type: Bug Components: CLI, Server Infrastructure Affects Versions: 0.6.0 Environment: SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (3) Hadoop 0.20.1 Hive 0.6.0 Reporter: Bharath R Assignee: Bharath R Attachments: HIVE-1872.1.patch Hive process is exiting on executing the below queries in the same order as mentioned 1) CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' 2) ALTER TABLE SAMPLETABLE add Partition(ds='sf') location '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse' After the second query execution , the hive throws the below exception and exiting the process 10:09:03 ERROR exec.DDLTask: FAILED: Error in metadata: table is partitioned but partition spec is not specified or tab: {ipz=100} org.apache.hadoop.hive.ql.metadata.HiveException: table is partitioned but partition spec is not specified or tab: {ipz=100} at org.apache.hadoop.hive.ql.metadata.Table.isValidSpec(Table.java:341) at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:902) at org.apache.hadoop.hive.ql.exec.DDLTask.addPartition(DDLTask.java:282) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:191) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:114) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) As the alter query is incorrect the exception was thrown, ideally it should be ALTER TABLE SAMPLETABLE add Partition(ds='sf',ipz=100) location '/user/hive/warehouse'. It is not good to exit the HIVE process when the query is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2086) Data loss with external table
Data loss with external table - Key: HIVE-2086 URL: https://issues.apache.org/jira/browse/HIVE-2086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon elastics mapreduce cluster Reporter: Q Long Data loss when using create external table like statement. 1) Set up an external table S, point to location L. Populate data in S. 2) Create another external table T, using statement like this: create external table T like S location L Make sure table T point to the same location as the original table S. 3) Query table T, see the same set of data in S. 4) drop table T. 5) Query table S will return nothing, and location L is deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2086) Data loss with external table
[ https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014109#comment-13014109 ] Edward Capriolo commented on HIVE-2086: --- Dropping an external table should not delete data. Are you saying that 'create table like' does not preserver the external property? Data loss with external table - Key: HIVE-2086 URL: https://issues.apache.org/jira/browse/HIVE-2086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon elastics mapreduce cluster Reporter: Q Long Data loss when using create external table like statement. 1) Set up an external table S, point to location L. Populate data in S. 2) Create another external table T, using statement like this: create external table T like S location L Make sure table T point to the same location as the original table S. 3) Query table T, see the same set of data in S. 4) drop table T. 5) Query table S will return nothing, and location L is deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2087) Dynamic partition insert performance problem
Dynamic partition insert performance problem Key: HIVE-2087 URL: https://issues.apache.org/jira/browse/HIVE-2087 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon EMR, S3 Reporter: Q Long Create an external(backed by S3) table T, make it partitioned by column P. Populate table T so it has large number of partitions (say 100). Execute statement like insert overwrite table T partition (p) select * from another_table check hive server log, and it will show that all existing partitions will be read and loaded before any mapper starts working. This feels excessive, given that the insert statement may only create or overwrite a very small number of partitions. Is there other reason that insert using dynamic partition requires loading the whole table? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2086) Data loss with external table
[ https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014126#comment-13014126 ] Q Long commented on HIVE-2086: -- It seems that create external table like does not preserve the external property. Note that both the original table S and the new table T are external, and data loss will only occur when creating T using statement create external table T like S location L. No data loss if T with full table definitions (i.e, does not use like statement) Data loss with external table - Key: HIVE-2086 URL: https://issues.apache.org/jira/browse/HIVE-2086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon elastics mapreduce cluster Reporter: Q Long Data loss when using create external table like statement. 1) Set up an external table S, point to location L. Populate data in S. 2) Create another external table T, using statement like this: create external table T like S location L Make sure table T point to the same location as the original table S. 3) Query table T, see the same set of data in S. 4) drop table T. 5) Query table S will return nothing, and location L is deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.20 #650
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/650/ -- [...truncated 29878 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-31_12-18-19_223_5644671997450329108/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-03-31 12:18:22,287 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-31_12-18-19_223_5644671997450329108/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311218_2034159440.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-31_12-18-23_706_5212736400386072146/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-31_12-18-23_706_5212736400386072146/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311218_408920310.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014166#comment-13014166 ] Namit Jain commented on HIVE-2084: -- @Ning/Paul, do you know if data nucleus 2.2.3 has the ability to support filter pushdown for predicates for non-equality. Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014168#comment-13014168 ] Namit Jain commented on HIVE-2084: -- +1 The code changes look good, but I will wait from a confirmation by Mac before checking it in. Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014182#comment-13014182 ] Ning Zhang commented on HIVE-2084: -- @Namit, yeah, 2.2.3 support filter push down for non-equality. Even the older version of 2.0.3 supposes it too. Mac's patch actually supports range queries, but since range queries could be complicated on multiple partition columns (what if the range is on the column that is not the top partition column), I didn't dig deep into it, but it the push down filtering criteria can certainly be relaxed. Having said that, my test results shows that JDO filter pushing down may not be the dominate factor (comparing to the patch in HIVE-2050). In the experiments I've done for HIVE-2050, listing partition names and filtering partitions in the Hive client side may take 10 sec, but retrieving all Partition objects takes about 10 mins in total. The best of pushing down JDO filtering can only reduce the 10 sec to 0, but the 10 mins overhead is still there. We need to find a way to optimize that away. Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2084: - Status: Open (was: Patch Available) @Ning: Why did you modify the class mapping in package.jdo? Does this require a metastore upgrade script? Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014192#comment-13014192 ] John Sichi commented on HIVE-1555: -- Thanks a lot, I've linked your PDF directly from the [[Hive/DesignDocs]] wiki page. JDBC Storage Handler Key: HIVE-1555 URL: https://issues.apache.org/jira/browse/HIVE-1555 Project: Hive Issue Type: New Feature Components: JDBC Reporter: Bob Robertson Assignee: Andrew Wilson Attachments: JDBCStorageHandler Design Doc.pdf Original Estimate: 24h Remaining Estimate: 24h With the Cassandra and HBase Storage Handlers I thought it would make sense to include a generic JDBC RDBMS Storage Handler so that you could import a standard DB table into Hive. Many people must want to perform HiveQL joins, etc against tables in other systems etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014240#comment-13014240 ] Ning Zhang commented on HIVE-2084: -- @Carl, one change (at line 49) in package.jdo is to fix a bug that was not exposed by the old datanucleus version. Without the change datanucleus will throw an exception in runtime (FCOMMENT is not a column of COLUMNS table). I guess the old version of datanucleus didn't check MFieldSchema mapping in package.jdo, by only retrieving the columns mentioned in the embedded elements. The other changes are to make the legacy mappings to confirm to the current relational schema (e.g., MFieldSchema.FNAME should be mapped to COLUMNS.COLUMN_NAME). They currently does not cause any runtime exceptions, but I guess it's better to fix it proactively if we are sure the relational mapping is wrong. Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.9.patch Uploaded new patch that addresses John's comments on patch 8. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Status: Patch Available (was: Open) Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1803: --- Attachment: HIVE-1803.10.patch Update patch to include more missing javadocs. Implement bitmap indexing in Hive - Key: HIVE-1803 URL: https://issues.apache.org/jira/browse/HIVE-1803 Project: Hive Issue Type: New Feature Components: Indexing Reporter: Marquis Wang Assignee: Marquis Wang Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar Implement bitmap index handler to complement compact indexing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014294#comment-13014294 ] Carl Steinbach commented on HIVE-2084: -- bq. One change (at line 49) in package.jdo is to fix a bug that was not exposed by the old datanucleus version. Without the change datanucleus will throw an exception in runtime (FCOMMENT is not a column of COLUMNS table). I guess the old version of datanucleus didn't check MFieldSchema mapping in package.jdo, by only retrieving the columns mentioned in the embedded elements. Yup, looks like that's the case. It also looks like Datanucleus was ignoring the size of the FCOMMENTS field, so the older versions of TYPE_FIELDS.COMMENT and COLUMNS.COMMENT have size 256, which must be the default value. In the new schema these fields both get bumped to 4000 bytes, which is the correct size. Can you please include upgrade scripts that update the size of these columns accordingly? Also, as far as I can tell the change to the MOrder mapping has no effect since it is only referenced by the SORT_COLS table, which overrides the name to COLUMN_NAME instead. Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
[ https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014302#comment-13014302 ] Mac Yang commented on HIVE-2084: Test has been running for about six hours without failure. Looks good. Upgrade datanucleus from 2.0.3 to 2.2.3 --- Key: HIVE-2084 URL: https://issues.apache.org/jira/browse/HIVE-2084 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2084.patch It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get the same set of partition objects takes about 1/4 of the time it took for the first time. While with 2.0.3, it took almost the same amount of time in the second execution. We should retest the test case mentioned in HIVE-1853, HIVE-1862. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014351#comment-13014351 ] Krishna Kumar commented on HIVE-2065: - As I indicated, the reason I made changes for sequence file compatibility is that the original design was more or less compatible, but the compatibility is now currently broken. If we decide that the compatibility is not a requirement, I am fine with that - a few documentation changes will be all that is necessary to indicate that situation. The current layout itself, with the SEQ prefix, incorrect key length and keyclass/valueclass header etc, will not make much sense, but we can designate all that 'legacy' ;) I'd like to move the discussions re potential approaches for better compression to another thread - any existing bug#? or should I open a new one?. RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.7.0-h0.20 #61
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/61/ -- [...truncated 27372 lines...] [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311907_750985713.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-31_19-07-42_271_3265981684882944171/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] 2011-03-31 19:07:45,321 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-31_19-07-42_271_3265981684882944171/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311907_347201241.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath 'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/hudson/hive_2011-03-31_19-07-46_874_7482268897505483011/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/hudson/hive_2011-03-31_19-07-46_874_7482268897505483011/-mr-1 [junit] OK [junit] PREHOOK: query: drop table
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014361#comment-13014361 ] He Yongqiang commented on HIVE-2065: Let's leave the compatibility issue, and fix the incorrect length issue in this jira. And feel free to open a new jira for the discussion of better compression. RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength 0) { throw new IOException(negative length keys not allowed: + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014394#comment-13014394 ] Russell Melick commented on HIVE-1644: -- I'm having trouble getting the partitions from an Index. I do not know how to get back to the index table, so I cannot use getPartCols() I would like to do something like this, but I don't know how to get the indexTable. {code:java} for (Index index : indexes.get(part.getTable())) { Table indexTable; indexTable = ??? ListFieldSchema indexPartitions = indexTable.getPartCols(); for (FieldSchema col : part.getCols()) { if (! indexPartitions.contains(col)) { return null; } } } {code} use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014398#comment-13014398 ] He Yongqiang commented on HIVE-1644: You have the list of partitions for the original table, and you just need to found out those partition names exists or not on the index table. So with getParitionByName() (pls check the code to find out the exact name) should work. use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive Client is indefenitely waiting for reading from Socket
Hi All, Hive Client is indefenitely waiting for reading from Socket. Thread dump i added below. Cause is: In the HiveClient, when client socket is created, the read timeout is mentioned is 0. So the socket will indefinetly wait when the machine where Hive Server is running is shutdown or network is unplugged. The same may not happen if the HiveServer alone is killed or gracefully shutdown. At this time, client will get connection reset exception. Code in HiveConnection --- transport = new TSocket(host, port); TProtocol protocol = new TBinaryProtocol(transport); client = new HiveClient(protocol); In the Client side, they send the query and wait for the response send_execute(query, id); recv_execute(); // place where client waiting is initiated Here we cannot give a time out for socket also, because Query may execute for long time. The query execution time cannot be predetermined. Any suggestions for fixing this issue. Thread dump: main prio=10 tid=0x40111000 nid=0x3641 runnable [0x7f0d73f29000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) locked 0x7f0d5d3f0828 (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java: 125) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol. java:192) at org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.jav a:130) at org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:109 ) locked 0x7f0d5d3f0878 (a org.apache.thrift.transport.TSocket) at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:21 8) at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:154) at com.huawei.isap.i3.HiveJdbcClient.main(HiveJdbcClient.java:114) ThanksRegards, Chinna Rao Lalam