date:20110331

Review Request: HIVE-2084. Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Ning Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/537/
---

Review request for hive.


Summary
---

upgrading datanucleus to the latest stable version


Diffs
-

  trunk/ivy/libraries.properties 1087196 
  trunk/metastore/ivy.xml 1087196 
  trunk/metastore/src/model/package.jdo 1087196 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1087196 

Diff: https://reviews.apache.org/r/537/diff


Testing
---

Testing using the script provided by HIVE-1862 while adding partitions to the 
input table. Test has been going on for several hours and so far so good.


Thanks,

Ning

[jira] [Updated] (HIVE-2084) Upgrade datanucleus to from 2.0.3 to 2.2.3

2011-03-31 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2084:
-

Attachment: HIVE-2084.patch

review board: https://reviews.apache.org/r/537/


Testing using the script provided by HIVE-1862 while adding partitions to the 
input table. Test has been going on for several hours and so far so good.

 Upgrade datanucleus to from 2.0.3 to 2.2.3
 --

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus to from 2.0.3 to 2.2.3

2011-03-31 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013852#comment-13013852
 ] 

Ning Zhang commented on HIVE-2084:
--

Devaraj/Mac, could you also test this patch on your production environment if 
you have other testing scripts? I'm testing the script uploaded by Namit in 
HIVE-1862 and this patch seems working fine so far. 

 Upgrade datanucleus to from 2.0.3 to 2.2.3
 --

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2084:
-

Status: Patch Available  (was: Open)

 Upgrade datanucleus from 2.0.3 to 2.2.3
 ---

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1872) Hive process is exiting on executing ALTER query

2011-03-31 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013960#comment-13013960
 ] 

Chinna Rao Lalam commented on HIVE-1872:


 Here i have few doubts. Just returning 9 will solve the problem of exiting 
jvm, but all the tasks (Mapred task and Non mapred task) that are running  will 
still continue to run if main thread is not returning (In case of HiveServer), 
there by wasting the cluster resources.  So the problem here is not only 
exiting the JVM but also the running tasks. Any suggestions on this.

 Hive process is exiting on executing ALTER query
 

 Key: HIVE-1872
 URL: https://issues.apache.org/jira/browse/HIVE-1872
 Project: Hive
  Issue Type: Bug
  Components: CLI, Server Infrastructure
Affects Versions: 0.6.0
 Environment: SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 
 2.6.16.60-0.21-smp (3)
 Hadoop 0.20.1
 Hive 0.6.0
Reporter: Bharath R 
Assignee: Bharath R 
 Attachments: HIVE-1872.1.patch


 Hive process is exiting on executing the below queries in the same order as 
 mentioned
 1) CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds 
 string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040'
 2) ALTER TABLE SAMPLETABLE add Partition(ds='sf') location 
 '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse'
 After the second query execution , the hive throws the below exception and 
 exiting the process
 10:09:03 ERROR exec.DDLTask: FAILED: Error in metadata: table is partitioned 
 but partition spec is not specified or tab: {ipz=100}
 org.apache.hadoop.hive.ql.metadata.HiveException: table is partitioned but 
 partition spec is not specified or tab: {ipz=100}
 at 
 org.apache.hadoop.hive.ql.metadata.Table.isValidSpec(Table.java:341)
 at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:902)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.addPartition(DDLTask.java:282)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:191)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
 at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:114)
 at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378)
 at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 As the alter query is incorrect the exception was thrown, ideally it should 
 be ALTER TABLE SAMPLETABLE add Partition(ds='sf',ipz=100) location 
 '/user/hive/warehouse'. 
 It is not good to exit the HIVE process when the query is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: Few code improvements in the ql and serde packages.

2011-03-31 Thread chinnarao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/535/
---

Review request for hive.


Summary
---

Few code improvements in the ql and serde packages.
1) Little performance Improvements 
2) Null checks to avoid NPEs
3) Effective varaible management.


This addresses bug HIVE-2080.
https://issues.apache.org/jira/browse/HIVE-2080


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FilterOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UnionOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ASTNode.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1086514 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeField.java
 1086514 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFieldType.java
 1086514 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFunction.java
 1086514 

Diff: https://reviews.apache.org/r/535/diff


Testing
---

Ran the tests. All tests passed


Thanks,

chinna

[jira] [Commented] (HIVE-2080) Few code improvements in the ql and serde packages.

2011-03-31 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014005#comment-13014005
 ] 

jirapos...@reviews.apache.org commented on HIVE-2080:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/535/
---

Review request for hive.


Summary
---

Few code improvements in the ql and serde packages.
1) Little performance Improvements 
2) Null checks to avoid NPEs
3) Effective varaible management.


This addresses bug HIVE-2080.
https://issues.apache.org/jira/browse/HIVE-2080


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FilterOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UnionOperator.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ASTNode.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086514 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1086514 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeField.java
 1086514 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFieldType.java
 1086514 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDeFunction.java
 1086514 

Diff: https://reviews.apache.org/r/535/diff


Testing
---

Ran the tests. All tests passed


Thanks,

chinna



 Few code improvements in the ql and serde packages.
 ---

 Key: HIVE-2080
 URL: https://issues.apache.org/jira/browse/HIVE-2080
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Serializers/Deserializers
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2080.Patch


 Few code improvements in the ql and serde packages.
 1) Little performance Improvements 
 2) Null checks to avoid NPEs
 3) Effective varaible management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.7.0-h0.20 #60

2011-03-31 Thread Apache Hudson Server

See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/60/

--
Started by timer
Building remotely on ubuntu1
Cleaning workspace https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/
Checking out http://svn.apache.org/repos/asf/hive/branches/branch-0.7
ERROR: Failed to check out 
http://svn.apache.org/repos/asf/hive/branches/branch-0.7
org.tmatesoft.svn.core.SVNException: svn: connection refused by the server
svn: OPTIONS request failed on '/repos/asf/hive/branches/branch-0.7'
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:106)
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:90)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:629)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:275)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:263)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:516)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:98)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1001)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:178)
at 
org.tmatesoft.svn.core.wc.SVNBasicClient.getRevisionNumber(SVNBasicClient.java:482)
at 
org.tmatesoft.svn.core.wc.SVNBasicClient.getLocations(SVNBasicClient.java:873)
at 
org.tmatesoft.svn.core.wc.SVNBasicClient.createRepository(SVNBasicClient.java:534)
at 
org.tmatesoft.svn.core.wc.SVNUpdateClient.doCheckout(SVNUpdateClient.java:901)
at 
hudson.scm.subversion.CheckoutUpdater$1.perform(CheckoutUpdater.java:83)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:137)
at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:725)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:706)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:690)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:1944)
at hudson.remoting.UserRequest.perform(UserRequest.java:114)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:270)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57)
... 1 more
Recording test results

[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-31 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014065#comment-13014065
 ] 

Krishna Kumar commented on HIVE-2065:
-

The RCFile layout seems to have been designed initially to be compatible to 
SequenceFile but over a period of time (esp. due to key compression 
enhancement?), it seems to have drifted away. The compatibility intent goes as 
far as to have boolean values always false etc (blockCompression), but couple 
of bugs have been introduced later whereby the recordlength is no longer the 
ondisk record length, and the keylength field is no longer the ondisk key 
length. Once I started writing a unit test for ensuring that the rcfile layout 
does stay in sync with sequence file layout, I also found that the classes 
designated as the keyclass/valueclass are no longer able to read themselves in 
or write themselves out, even if properly 'primed'. That is the primary aim of 
the changes due to #3. 

[PS. The reason I am looking into this now is experiment with column-specific 
compression ('use this codec for this sorted, numeric column') or type-specific 
compression ('use this codec for all enumerations types of this table'). 
Presumably, if successful, this information will be put into metadata as I am 
doing with the generic codec in the changes above.]

 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Mac Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014090#comment-13014090
 ] 

Mac Yang commented on HIVE-2084:


Ning, I will test this patch with our usual set up and let you know how it goes.

 Upgrade datanucleus from 2.0.3 to 2.2.3
 ---

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1872) Hive process is exiting on executing ALTER query

2011-03-31 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014096#comment-13014096
 ] 

John Sichi commented on HIVE-1872:
--

Good point about Hive server.  As a short-term solution to flaky tests, adding 
a config param that is only enabled for tests should still work fine, but I 
agree that it would be best to address the bigger issues.

 Hive process is exiting on executing ALTER query
 

 Key: HIVE-1872
 URL: https://issues.apache.org/jira/browse/HIVE-1872
 Project: Hive
  Issue Type: Bug
  Components: CLI, Server Infrastructure
Affects Versions: 0.6.0
 Environment: SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 
 2.6.16.60-0.21-smp (3)
 Hadoop 0.20.1
 Hive 0.6.0
Reporter: Bharath R 
Assignee: Bharath R 
 Attachments: HIVE-1872.1.patch


 Hive process is exiting on executing the below queries in the same order as 
 mentioned
 1) CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds 
 string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040'
 2) ALTER TABLE SAMPLETABLE add Partition(ds='sf') location 
 '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse'
 After the second query execution , the hive throws the below exception and 
 exiting the process
 10:09:03 ERROR exec.DDLTask: FAILED: Error in metadata: table is partitioned 
 but partition spec is not specified or tab: {ipz=100}
 org.apache.hadoop.hive.ql.metadata.HiveException: table is partitioned but 
 partition spec is not specified or tab: {ipz=100}
 at 
 org.apache.hadoop.hive.ql.metadata.Table.isValidSpec(Table.java:341)
 at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:902)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.addPartition(DDLTask.java:282)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:191)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
 at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:114)
 at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378)
 at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 As the alter query is incorrect the exception was thrown, ideally it should 
 be ALTER TABLE SAMPLETABLE add Partition(ds='sf',ipz=100) location 
 '/user/hive/warehouse'. 
 It is not good to exit the HIVE process when the query is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2086) Data loss with external table

2011-03-31 Thread Q Long (JIRA)

Data loss with external table
-

 Key: HIVE-2086
 URL: https://issues.apache.org/jira/browse/HIVE-2086
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
 Environment: Amazon  elastics mapreduce cluster
Reporter: Q Long


Data loss when using create external table like statement. 

1) Set up an external table S, point to location L. Populate data in S.
2) Create another external table T, using statement like this:
create external table T like S location L
   Make sure table T point to the same location as the original table S.
3) Query table T, see the same set of data in S.
4) drop table T.
5) Query table S will return nothing, and location L is deleted. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2086) Data loss with external table

2011-03-31 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014109#comment-13014109
 ] 

Edward Capriolo commented on HIVE-2086:
---

Dropping an external table should not delete data. Are you saying that 'create 
table like' does not preserver the external property?

 Data loss with external table
 -

 Key: HIVE-2086
 URL: https://issues.apache.org/jira/browse/HIVE-2086
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
 Environment: Amazon  elastics mapreduce cluster
Reporter: Q Long

 Data loss when using create external table like statement. 
 1) Set up an external table S, point to location L. Populate data in S.
 2) Create another external table T, using statement like this:
 create external table T like S location L
Make sure table T point to the same location as the original table S.
 3) Query table T, see the same set of data in S.
 4) drop table T.
 5) Query table S will return nothing, and location L is deleted. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2087) Dynamic partition insert performance problem

2011-03-31 Thread Q Long (JIRA)

Dynamic partition insert performance problem


 Key: HIVE-2087
 URL: https://issues.apache.org/jira/browse/HIVE-2087
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
 Environment: Amazon EMR, S3
Reporter: Q Long


Create an external(backed by S3) table T, make it partitioned by column P. 
Populate table T so it has large number of partitions (say 100). Execute 
statement like

insert overwrite table T partition (p) select * from another_table

check hive server log, and it will show that all existing partitions will be 
read and loaded before any mapper starts working. This feels excessive, given 
that the insert statement may only create or overwrite a very small number of 
partitions. Is there other reason that insert using dynamic partition requires 
loading the whole table?



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2086) Data loss with external table

2011-03-31 Thread Q Long (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014126#comment-13014126
 ] 

Q Long commented on HIVE-2086:
--

It seems that create external table like  does not preserve the external 
property.  

Note that both the original table S and the new table T are external, and data 
loss will only occur when creating T using statement
create external table T like S location L.  No data loss if T with full table 
definitions (i.e, does not use like statement)



 Data loss with external table
 -

 Key: HIVE-2086
 URL: https://issues.apache.org/jira/browse/HIVE-2086
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
 Environment: Amazon  elastics mapreduce cluster
Reporter: Q Long

 Data loss when using create external table like statement. 
 1) Set up an external table S, point to location L. Populate data in S.
 2) Create another external table T, using statement like this:
 create external table T like S location L
Make sure table T point to the same location as the original table S.
 3) Query table T, see the same set of data in S.
 4) drop table T.
 5) Query table S will return nothing, and location L is deleted. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-trunk-h0.20 #650

2011-03-31 Thread Apache Hudson Server

See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/650/

--
[...truncated 29878 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_12-18-19_223_5644671997450329108/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-31 12:18:22,287 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_12-18-19_223_5644671997450329108/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311218_2034159440.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_12-18-23_706_5212736400386072146/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_12-18-23_706_5212736400386072146/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311218_408920310.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014166#comment-13014166
 ] 

Namit Jain commented on HIVE-2084:
--

@Ning/Paul, do you know if data nucleus 2.2.3 has the ability to 
support filter pushdown for predicates for non-equality.

 Upgrade datanucleus from 2.0.3 to 2.2.3
 ---

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014168#comment-13014168
 ] 

Namit Jain commented on HIVE-2084:
--

+1

The code changes look good, but I will wait from a confirmation by
Mac before checking it in.

 Upgrade datanucleus from 2.0.3 to 2.2.3
 ---

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Ning Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014182#comment-13014182
]

Ning Zhang commented on HIVE-2084:
--

@Namit, yeah, 2.2.3 support filter push down for non-equality. Even the older
version of 2.0.3 supposes it too. Mac's patch actually supports range queries,
but since range queries could be complicated on multiple partition columns
(what if the range is on the column that is not the top partition column), I
didn't dig deep into it, but it the push down filtering criteria can certainly
be relaxed.

Having said that, my test results shows that JDO filter pushing down may not be
the dominate factor (comparing to the patch in HIVE-2050). In the experiments
I've done for HIVE-2050, listing partition names and filtering partitions in
the Hive client side may take 10 sec, but retrieving all Partition objects
takes about 10 mins in total. The best of pushing down JDO filtering can only
reduce the 10 sec to 0, but the 10 mins overhead is still there. We need to
find a way to optimize that away.

Upgrade datanucleus from 2.0.3 to 2.2.3
---

Key: HIVE-2084
URL: https://issues.apache.org/jira/browse/HIVE-2084
Project: Hive
Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
Attachments: HIVE-2084.patch

It seems the datanucleus 2.2.3 does a better join in caching. The time it
takes to get the same set of partition objects takes about 1/4 of the time it
took for the first time. While with 2.0.3, it took almost the same amount of
time in the second execution. We should retest the test case mentioned in
HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2084:
-

Status: Open  (was: Patch Available)

@Ning: Why did you modify the class mapping in package.jdo? Does this require a 
metastore upgrade script?

 Upgrade datanucleus from 2.0.3 to 2.2.3
 ---

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1555) JDBC Storage Handler

2011-03-31 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014192#comment-13014192
 ] 

John Sichi commented on HIVE-1555:
--

Thanks a lot, I've linked your PDF directly from the [[Hive/DesignDocs]] wiki 
page.

 JDBC Storage Handler
 

 Key: HIVE-1555
 URL: https://issues.apache.org/jira/browse/HIVE-1555
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Reporter: Bob Robertson
Assignee: Andrew Wilson
 Attachments: JDBCStorageHandler Design Doc.pdf

   Original Estimate: 24h
  Remaining Estimate: 24h

 With the Cassandra and HBase Storage Handlers I thought it would make sense 
 to include a generic JDBC RDBMS Storage Handler so that you could import a 
 standard DB table into Hive. Many people must want to perform HiveQL joins, 
 etc against tables in other systems etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Ning Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014240#comment-13014240
]

Ning Zhang commented on HIVE-2084:
--

@Carl, one change (at line 49) in package.jdo is to fix a bug that was not
exposed by the old datanucleus version. Without the change datanucleus will
throw an exception in runtime (FCOMMENT is not a column of COLUMNS table). I
guess the old version of datanucleus didn't check MFieldSchema mapping in
package.jdo, by only retrieving the columns mentioned in the embedded
elements. The other changes are to make the legacy mappings to confirm to the
current relational schema (e.g., MFieldSchema.FNAME should be mapped to
COLUMNS.COLUMN_NAME). They currently does not cause any runtime exceptions, but
I guess it's better to fix it proactively if we are sure the relational mapping
is wrong.

Upgrade datanucleus from 2.0.3 to 2.2.3
---

Key: HIVE-2084
URL: https://issues.apache.org/jira/browse/HIVE-2084
Project: Hive
Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
Attachments: HIVE-2084.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-03-31 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Attachment: HIVE-1803.9.patch

Uploaded new patch that addresses John's comments on patch 8.

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
 HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, 
 HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, 
 bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-03-31 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Status: Patch Available  (was: Open)

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
 HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, 
 HIVE-1803.8.patch, HIVE-1803.9.patch, JavaEWAH_20110304.zip, 
 bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

2011-03-31 Thread Marquis Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Attachment: HIVE-1803.10.patch

Update patch to include more missing javadocs.

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.10.patch, 
 HIVE-1803.2.patch, HIVE-1803.3.patch, HIVE-1803.4.patch, HIVE-1803.5.patch, 
 HIVE-1803.6.patch, HIVE-1803.7.patch, HIVE-1803.8.patch, HIVE-1803.9.patch, 
 JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, 
 javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Carl Steinbach (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014294#comment-13014294
]

Carl Steinbach commented on HIVE-2084:
--

bq. One change (at line 49) in package.jdo is to fix a bug that was not exposed
by the old datanucleus version. Without the change datanucleus will throw an
exception in runtime (FCOMMENT is not a column of COLUMNS table). I guess the
old version of datanucleus didn't check MFieldSchema mapping in package.jdo, by
only retrieving the columns mentioned in the embedded elements.

Yup, looks like that's the case. It also looks like Datanucleus was ignoring
the size of the FCOMMENTS field, so the older versions of TYPE_FIELDS.COMMENT
and COLUMNS.COMMENT have size 256, which must be the default value. In the new
schema these fields both get bumped to 4000 bytes, which is the correct size.
Can you please include upgrade scripts that update the size of these columns
accordingly?

Also, as far as I can tell the change to the MOrder mapping has no effect since
it is only referenced by the SORT_COLS table, which overrides the name to
COLUMN_NAME instead.

Upgrade datanucleus from 2.0.3 to 2.2.3
---

Key: HIVE-2084
URL: https://issues.apache.org/jira/browse/HIVE-2084
Project: Hive
Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
Attachments: HIVE-2084.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

2011-03-31 Thread Mac Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014302#comment-13014302
 ] 

Mac Yang commented on HIVE-2084:


Test has been running for about six hours without failure. Looks good.

 Upgrade datanucleus from 2.0.3 to 2.2.3
 ---

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-31 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014351#comment-13014351
 ] 

Krishna Kumar commented on HIVE-2065:
-

As I indicated, the reason I made changes for sequence file compatibility is 
that the original design was more or less compatible, but the compatibility is 
now currently broken. If we decide that the compatibility is not a requirement, 
I am fine with that - a few documentation changes will be all that is necessary 
to indicate that situation. The current layout itself, with the SEQ prefix, 
incorrect key length and keyclass/valueclass header etc, will not make much 
sense, but we can designate all that 'legacy' ;)


I'd like to move the discussions re potential approaches for better compression 
to another thread - any existing bug#? or should I open a new one?.


 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.7.0-h0.20 #61

2011-03-31 Thread Apache Hudson Server

See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/61/

--
[...truncated 27372 lines...]
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311907_750985713.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_19-07-42_271_3265981684882944171/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] 2011-03-31 19:07:45,321 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_19-07-42_271_3265981684882944171/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103311907_347201241.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_19-07-46_874_7482268897505483011/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-31_19-07-46_874_7482268897505483011/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table

[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-31 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014361#comment-13014361
 ] 

He Yongqiang commented on HIVE-2065:


Let's leave the compatibility issue, and fix the incorrect length issue in this 
jira.

And feel free to open a new jira for the discussion of better compression.

 RCFile issues
 -

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor
 Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png


 Some potential issues with RCFile
 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
 yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
 as well get rid of the confusing and performance-impacting lock acquisitions.
 2. Record Length overstated for compressed files. IIUC, the key compression 
 happens after we have written the record length.
 {code}
   int keyLength = key.getSize();
   if (keyLength  0) {
 throw new IOException(negative length keys not allowed:  + key);
   }
   out.writeInt(keyLength + valueLength); // total record length
   out.writeInt(keyLength); // key portion length
   if (!isCompressed()) {
 out.writeInt(keyLength);
 key.write(out); // key
   } else {
 keyCompressionBuffer.reset();
 keyDeflateFilter.resetState();
 key.write(keyDeflateOut);
 keyDeflateOut.flush();
 keyDeflateFilter.finish();
 int compressedKeyLen = keyCompressionBuffer.getLength();
 out.writeInt(compressedKeyLen);
 out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
   }
 {code}
 3. For sequence file compatibility, the compressed key length should be the 
 next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-03-31 Thread Russell Melick (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014394#comment-13014394
 ] 

Russell Melick commented on HIVE-1644:
--

I'm having trouble getting the partitions from an Index.  I do not know how to 
get back to the index table, so I cannot use getPartCols()

I would like to do something like this, but I don't know how to get the 
indexTable.

{code:java}
for (Index index : indexes.get(part.getTable())) {
Table indexTable;
indexTable = ???
ListFieldSchema indexPartitions = indexTable.getPartCols();
for (FieldSchema col : part.getCols()) {
  if (! indexPartitions.contains(col)) {
return null;
  }
}
  }
{code}

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
 HIVE-1644.11.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, 
 HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, 
 HIVE-1644.9.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-03-31 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014398#comment-13014398
 ] 

He Yongqiang commented on HIVE-1644:


You have the list of partitions for the original table, and you just need to 
found out those partition names exists or not on the index table. So with 
getParitionByName() (pls check the code to find out the exact name) should 
work. 

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
 HIVE-1644.11.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, 
 HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, 
 HIVE-1644.9.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive Client is indefenitely waiting for reading from Socket

2011-03-31 Thread Chinna

Hi All,

Hive Client is indefenitely waiting for reading from Socket. Thread dump i
added below.

Cause is:
 
In the HiveClient, when client socket is created, the read timeout
is mentioned is 0. So the socket will indefinetly wait when the machine
where Hive Server is running is shutdown or network is unplugged. The
same may not happen if the HiveServer alone is killed or gracefully
shutdown. At this time, client will get connection reset exception. 

Code in HiveConnection
--- 
transport = new TSocket(host, port);
TProtocol protocol = new TBinaryProtocol(transport);
client = new HiveClient(protocol);

In the Client side, they send the query and wait for the response 
send_execute(query, id);
recv_execute(); // place where client waiting is initiated

Here we cannot give a time out for socket also, because Query may execute
for long time. The query execution time cannot be predetermined. 

Any suggestions for fixing this issue.

Thread dump:

main prio=10 tid=0x40111000 nid=0x3641 runnable
[0x7f0d73f29000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317) 

locked 0x7f0d5d3f0828 (a java.io.BufferedInputStream)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:
125)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.
java:192)
at
org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.jav
a:130)
at
org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:109
) 
locked 0x7f0d5d3f0878 (a org.apache.thrift.transport.TSocket)
at
org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:21
8)
at
org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:154)
at com.huawei.isap.i3.HiveJdbcClient.main(HiveJdbcClient.java:114)


ThanksRegards,
Chinna Rao Lalam

Review Request: HIVE-2084. Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Updated] (HIVE-2084) Upgrade datanucleus to from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-2084) Upgrade datanucleus to from 2.0.3 to 2.2.3

[jira] [Updated] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-1872) Hive process is exiting on executing ALTER query

Review Request: Few code improvements in the ql and serde packages.

[jira] [Commented] (HIVE-2080) Few code improvements in the ql and serde packages.

Build failed in Jenkins: Hive-0.7.0-h0.20 #60

[jira] [Commented] (HIVE-2065) RCFile issues

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-1872) Hive process is exiting on executing ALTER query

[jira] [Created] (HIVE-2086) Data loss with external table

[jira] [Commented] (HIVE-2086) Data loss with external table

[jira] [Created] (HIVE-2087) Dynamic partition insert performance problem

[jira] [Commented] (HIVE-2086) Data loss with external table

Build failed in Jenkins: Hive-trunk-h0.20 #650

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Updated] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-1555) JDBC Storage Handler

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

[jira] [Updated] (HIVE-1803) Implement bitmap indexing in Hive

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3

[jira] [Commented] (HIVE-2065) RCFile issues

Build failed in Jenkins: Hive-0.7.0-h0.20 #61

[jira] [Commented] (HIVE-2065) RCFile issues

[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes

[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes

Hive Client is indefenitely waiting for reading from Socket

33 matches

Site Navigation

Mail list logo

Footer information