[jira] [Updated] (HIVE-2110) Hive Client is indefenitely waiting for reading from Socket

2011-08-11 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-2110:
--

Fix Version/s: 0.8.0
   Status: Patch Available  (was: Open)

 Hive Client is indefenitely waiting for reading from Socket
 ---

 Key: HIVE-2110
 URL: https://issues.apache.org/jira/browse/HIVE-2110
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Prasad Mujumdar
 Fix For: 0.8.0


 Hive Client is indefenitely waiting for reading from Socket. Thread dump i  
 added below.
 Cause is:
  
   In the HiveClient, when client socket is created, the read timeout is 
 mentioned is 0. So the socket will indefinetly wait when the machine where 
 Hive Server is running is shutdown or network is unplugged. The same may 
 not happen if the HiveServer alone is killed or gracefully shutdown. At this 
 time, client will get connection reset exception. 
 Code in HiveConnection
 ---
 {noformat}
 transport = new TSocket(host, port);
 TProtocol protocol = new TBinaryProtocol(transport); 
 client = new HiveClient(protocol);
 {noformat}
 In the Client side, they send the query and wait for the response 
 send_execute(query,id); recv_execute(); // place where client waiting is 
 initiated
 Thread dump:
 {noformat}
 main prio=10 tid=0x40111000 nid=0x3641 runnable [0x7f0d73f29000]
   java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:317) 
   locked 0x7f0d5d3f0828 (a java.io.BufferedInputStream)
   at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:130)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:109) 
   locked 0x7f0d5d3f0878 (a org.apache.thrift.transport.TSocket)
   at 
 org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:218)
   at 
 org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:154)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2110) Hive Client is indefenitely waiting for reading from Socket

2011-08-11 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-2110:
--

Attachment: HIVE-2110.patch

 Hive Client is indefenitely waiting for reading from Socket
 ---

 Key: HIVE-2110
 URL: https://issues.apache.org/jira/browse/HIVE-2110
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Prasad Mujumdar
 Fix For: 0.8.0

 Attachments: HIVE-2110.patch


 Hive Client is indefenitely waiting for reading from Socket. Thread dump i  
 added below.
 Cause is:
  
   In the HiveClient, when client socket is created, the read timeout is 
 mentioned is 0. So the socket will indefinetly wait when the machine where 
 Hive Server is running is shutdown or network is unplugged. The same may 
 not happen if the HiveServer alone is killed or gracefully shutdown. At this 
 time, client will get connection reset exception. 
 Code in HiveConnection
 ---
 {noformat}
 transport = new TSocket(host, port);
 TProtocol protocol = new TBinaryProtocol(transport); 
 client = new HiveClient(protocol);
 {noformat}
 In the Client side, they send the query and wait for the response 
 send_execute(query,id); recv_execute(); // place where client waiting is 
 initiated
 Thread dump:
 {noformat}
 main prio=10 tid=0x40111000 nid=0x3641 runnable [0x7f0d73f29000]
   java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:317) 
   locked 0x7f0d5d3f0828 (a java.io.BufferedInputStream)
   at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:130)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:109) 
   locked 0x7f0d5d3f0878 (a org.apache.thrift.transport.TSocket)
   at 
 org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:218)
   at 
 org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:154)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Jenkins build is back to normal : Hive-trunk-h0.21 #887

2011-08-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-trunk-h0.21/887/changes




[jira] [Commented] (HIVE-2346) Add hooks to run when execution fails.

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083013#comment-13083013
 ] 

Hudson commented on HIVE-2346:
--

Integrated in Hive-trunk-h0.21 #887 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/887/])
HIVE-2346. Add hooks to run when execution fails. (Kevin Wilfong via Ning 
Zhang)

nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156480
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/conf/hive-default.xml


 Add hooks to run when execution fails.
 --

 Key: HIVE-2346
 URL: https://issues.apache.org/jira/browse/HIVE-2346
 Project: Hive
  Issue Type: Improvement
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.8.0

 Attachments: HIVE-2346.1.patch.txt, HIVE-2346.2.patch.txt, 
 HIVE-2346.3.patch.txt


 Currently, when a query fails, the Post Execution Hooks are not run.
 Adding hooks to be run when a query fails could allow for better logging etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-08-11 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam reassigned HIVE-2181:
--

Assignee: Chinna Rao Lalam

  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
  Labels: patch
 Fix For: 0.8.0

 Attachments: HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Minor typo in error message

2011-08-11 Thread Clément Notin
Hello,

I'm new here and I just wanted to report a minor typo issue in
HiveConnection.java (jdbc) :

throw new SQLException(Could not establish connecton to 
+ uri + :  + e.getMessage(), 08S01);

It seems like there's a i missing ;)


Hive is a nice work BTW ! Thanks.

-- 
*Clément **Notin*


Re: Minor typo in error message

2011-08-11 Thread Jakob Homan
Hey Clément-
   Thanks for the report - would you be able to open a JIRA for it
(https://issues.apache.org/jira/browse/HIVE)? I'm sure someone will
whip up a patch shortly, or if you're interested in contributing, I'd
invite you to create one.

Thanks,
Jakob

2011/8/11  Notin clement.no...@gmail.com:
 Hello,

 I'm new here and I just wanted to report a minor typo issue in
 HiveConnection.java (jdbc) :

 throw new SQLException(Could not establish connecton to 
            + uri + :  + e.getMessage(), 08S01);

 It seems like there's a i missing ;)


 Hive is a nice work BTW ! Thanks.

 --
 *Clément **Notin*



[jira] [Updated] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table

2011-08-11 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2315:
---

Attachment: (was: HIVE-2315_part2.patch)

 DatabaseMetadata.getColumns() does not return partition column names for a 
 table
 

 Key: HIVE-2315
 URL: https://issues.apache.org/jira/browse/HIVE-2315
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Mythili Gopalakrishnan
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 0.8.0

 Attachments: HIVE-2315.patch


 getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return 
 the partition column names. Where as from HIVE CLI, if you do a 'describe 
 tablename' you get all columns including the partition columns. It would be 
 nice if getColumns() method returns all columns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table

2011-08-11 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2315:
---

Attachment: HIVE-2315.patch

Updated to a single patch with fix and both sets of tests.

 DatabaseMetadata.getColumns() does not return partition column names for a 
 table
 

 Key: HIVE-2315
 URL: https://issues.apache.org/jira/browse/HIVE-2315
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Mythili Gopalakrishnan
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 0.8.0

 Attachments: HIVE-2315.patch, HIVE-2315.patch


 getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return 
 the partition column names. Where as from HIVE CLI, if you do a 'describe 
 tablename' you get all columns including the partition columns. It would be 
 nice if getColumns() method returns all columns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: HIVE-2315 DatabaseMetadata.getColumns() does not return partition column names for a table

2011-08-11 Thread Patrick Hunt

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1468/
---

Review request for hive and Carl Steinbach.


Summary
---

This patch fixes the problem and adds a couple of tests.


This addresses bug HIVE-2315.
https://issues.apache.org/jira/browse/HIVE-2315


Diffs
-

  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java d570fca 
  jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java d72cf43 

Diff: https://reviews.apache.org/r/1468/diff


Testing
---

units pass, a user also verified it fixed the issue they were seeing.


Thanks,

Patrick



[jira] [Updated] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table

2011-08-11 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated HIVE-2315:
---

Status: Patch Available  (was: Open)

 DatabaseMetadata.getColumns() does not return partition column names for a 
 table
 

 Key: HIVE-2315
 URL: https://issues.apache.org/jira/browse/HIVE-2315
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Mythili Gopalakrishnan
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 0.8.0

 Attachments: HIVE-2315.patch, HIVE-2315.patch


 getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return 
 the partition column names. Where as from HIVE CLI, if you do a 'describe 
 tablename' you get all columns including the partition columns. It would be 
 nice if getColumns() method returns all columns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083257#comment-13083257
 ] 

jirapos...@reviews.apache.org commented on HIVE-2315:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1468/
---

Review request for hive and Carl Steinbach.


Summary
---

This patch fixes the problem and adds a couple of tests.


This addresses bug HIVE-2315.
https://issues.apache.org/jira/browse/HIVE-2315


Diffs
-

  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java d570fca 
  jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java d72cf43 

Diff: https://reviews.apache.org/r/1468/diff


Testing
---

units pass, a user also verified it fixed the issue they were seeing.


Thanks,

Patrick



 DatabaseMetadata.getColumns() does not return partition column names for a 
 table
 

 Key: HIVE-2315
 URL: https://issues.apache.org/jira/browse/HIVE-2315
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Mythili Gopalakrishnan
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 0.8.0

 Attachments: HIVE-2315.patch, HIVE-2315.patch


 getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return 
 the partition column names. Where as from HIVE CLI, if you do a 'describe 
 tablename' you get all columns including the partition columns. It would be 
 nice if getColumns() method returns all columns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Minor typo in error message

2011-08-11 Thread Clément Notin
It's done. HIVE-2369 https://issues.apache.org/jira/browse/HIVE-2369
(I didn't have the will to create an account, it's done now !)

2011/8/11 Jakob Homan jgho...@gmail.com

 Hey Clément-
   Thanks for the report - would you be able to open a JIRA for it
 (https://issues.apache.org/jira/browse/HIVE)? I'm sure someone will
 whip up a patch shortly, or if you're interested in contributing, I'd
 invite you to create one.

 Thanks,
 Jakob

 2011/8/11  Notin clement.no...@gmail.com:
  Hello,
 
  I'm new here and I just wanted to report a minor typo issue in
  HiveConnection.java (jdbc) :
 
  throw new SQLException(Could not establish connecton to 
 + uri + :  + e.getMessage(), 08S01);
 
  It seems like there's a i missing ;)
 
 
  Hive is a nice work BTW ! Thanks.
 
  --
  *Clément **Notin*
 




-- 
*Clément **Notin*


[jira] [Created] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)

2011-08-11 Thread JIRA
Minor typo in error message in HiveConnection.java (JDBC)
-

 Key: HIVE-2369
 URL: https://issues.apache.org/jira/browse/HIVE-2369
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1, 0.8.0
 Environment: Linux
Reporter: Clément Notin
Priority: Trivial


There is a minor typo issue in HiveConnection.java (jdbc) :

{code}throw new SQLException(Could not establish connecton to 
+ uri + :  + e.getMessage(), 08S01);{code}

It seems like there's a i missing.

I know it's a very minor typo but I report it anyway. I won't attach a patch 
because it would be too long for me to SVN checkout just for 1 letter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)

2011-08-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clément Notin updated HIVE-2369:


Status: Open  (was: Patch Available)

 Minor typo in error message in HiveConnection.java (JDBC)
 -

 Key: HIVE-2369
 URL: https://issues.apache.org/jira/browse/HIVE-2369
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1, 0.8.0
 Environment: Linux
Reporter: Clément Notin
Priority: Trivial
   Original Estimate: 2m
  Remaining Estimate: 2m

 There is a minor typo issue in HiveConnection.java (jdbc) :
 {code}throw new SQLException(Could not establish connecton to 
 + uri + :  + e.getMessage(), 08S01);{code}
 It seems like there's a i missing.
 I know it's a very minor typo but I report it anyway. I won't attach a patch 
 because it would be too long for me to SVN checkout just for 1 letter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)

2011-08-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clément Notin updated HIVE-2369:


Status: Patch Available  (was: Open)

Easy patch

 Minor typo in error message in HiveConnection.java (JDBC)
 -

 Key: HIVE-2369
 URL: https://issues.apache.org/jira/browse/HIVE-2369
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1, 0.8.0
 Environment: Linux
Reporter: Clément Notin
Priority: Trivial
   Original Estimate: 2m
  Remaining Estimate: 2m

 There is a minor typo issue in HiveConnection.java (jdbc) :
 {code}throw new SQLException(Could not establish connecton to 
 + uri + :  + e.getMessage(), 08S01);{code}
 It seems like there's a i missing.
 I know it's a very minor typo but I report it anyway. I won't attach a patch 
 because it would be too long for me to SVN checkout just for 1 letter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)

2011-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083279#comment-13083279
 ] 

Clément Notin commented on HIVE-2369:
-

I wrote the patch on GitHub and made a pull request. You can get it there.

 Minor typo in error message in HiveConnection.java (JDBC)
 -

 Key: HIVE-2369
 URL: https://issues.apache.org/jira/browse/HIVE-2369
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1, 0.8.0
 Environment: Linux
Reporter: Clément Notin
Priority: Trivial
   Original Estimate: 2m
  Remaining Estimate: 2m

 There is a minor typo issue in HiveConnection.java (jdbc) :
 {code}throw new SQLException(Could not establish connecton to 
 + uri + :  + e.getMessage(), 08S01);{code}
 It seems like there's a i missing.
 I know it's a very minor typo but I report it anyway. I won't attach a patch 
 because it would be too long for me to SVN checkout just for 1 letter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-08-11 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083306#comment-13083306
 ] 

MIS commented on HIVE-2181:
---

-1 for the issue.
What if I'm running multiple hive servers on different port in the same machine 
{With my metastore db on a mysql server}, then if one of the server instances 
restarts, it would end up deleting the scratch dir, which would affect other 
running instances as well. Even if we specify different scratch dir for each of 
the instances, I doubt about the value add from this property.

  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
  Labels: patch
 Fix For: 0.8.0

 Attachments: HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[hive] Minor typo connecton - connection. (#3)

2011-08-11 Thread ClementNotin
Fixes HIVE-2369

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/hive/pull/3


[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-08-11 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083401#comment-13083401
 ] 

Chinna Rao Lalam commented on HIVE-2181:


Hi MIS,  
   Thanks for the point as u said if we have multiple instances with the same 
scratch dir on same machine it wont help. But in this case if we give different 
value for the scratch dir it may help(I will double check this point).

   I will introduce one propety for this like hive.start.cleanup.scrachdir . 
This cleanup can trigger based on this property value. By default it will be 
turned off. If cleanup need to do while starting the server turn on. 

  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
  Labels: patch
 Fix For: 0.8.0

 Attachments: HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538

2011-08-11 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2344:
-

  Resolution: Fixed
Release Note: When predicate pushdown is enabled, Hive would previously 
incorrectly push down predicates on non-deterministic function invocations when 
those were indirectly referenced via a nested SELECT list rather than directly 
in the filter expression.  After this change, Hive no longer pushes down 
filters over indirect references to function invocations of any kind 
(regardless of determinism).  Note that in Hive, even builtin operators such as 
+ and CAST are treated as function invocations.
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed.  Thanks Amareshwari!


 filter is removed due to regression of HIVE-1538
 

 Key: HIVE-2344
 URL: https://issues.apache.org/jira/browse/HIVE-2344
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: He Yongqiang
Assignee: Amareshwari Sriramadasu
 Fix For: 0.8.0

 Attachments: hive-patch-2344-2.txt, hive-patch-2344.txt, 
 ppd_udf_col.q.out.txt


  select * from 
  (
  select type_bucket,randum123
  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) 
 a
  where randum123 =0.1)s where s.randum1230.1 limit 20;
 This is returning results...
 and 
  explain
  select type_bucket,randum123
  from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) 
 a
  where randum123 =0.1
 shows that there is no filter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




HIVE-1538/HIVE-2344

2011-08-11 Thread John Sichi
I just committed the fix from Amareshwari, so after this gets pushed, it should 
be possible to back out the conf changes which were applied to avoid the bug 
from HIVE-1538.

Read the release notes I added on HIVE-2344 and chime in with a new JIRA issue 
if you think there are cases where it's important to do finer discrimination in 
what kinds of SELECT expressions to allow for ppd...in general it's a 
cost-based optimizer problem.

As an example, consider

select * from
(select f(x,y) as z from t) s 
where z  3;

Before HIVE-1538, there was a bug where we would push down f(x,y)3 even when f 
was non-deterministic.

HIVE-1538 made that bug much more obvious.

HIVE-2344 fixes it, but also prevents the pushdown even in cases where f is 
deterministic.  This is good in some cases (e.g. when f is expensive to compute 
and the filter selectivity is poor), but could be bad in others (e.g. when f is 
something simple like a CAST and the filter is highly selective).

JVS



Re: HIVE-1538/HIVE-2344

2011-08-11 Thread John Sichi

On Aug 11, 2011, at 1:21 PM,  wrote:

 I just committed the fix from Amareshwari, so after this gets pushed, it 
 should be possible to back out the conf changes which were applied to avoid 
 the bug from HIVE-1538.

(Oops, ignore this part...no conf changes were applied in Hive source, so this 
was Facebook-specific.)

JVS



Running Hive from Eclipse

2011-08-11 Thread john smith
Hi folks,

I am trying to run Hive from eclipse. I've set it up correctly and it is
building the jars and stuff. However I face execeptions when I try to run
hive queries like show tables etc. There  has been a discussion on this in
the mailing list previously but there was no solution provided. It runs
perfectly from command line .

I am making a few changes to the hive source and every time I need to jar it
from the command line and run it .Is there some way to run it directly from
eclipse?

Please help,

Thanks,
JS


Re: Running Hive from Eclipse

2011-08-11 Thread Carl Steinbach
Hi John,

Can you please include the error messages/exceptions that you're
encountering?

Thanks.

Carl

On Thu, Aug 11, 2011 at 1:40 PM, john smith js1987.sm...@gmail.com wrote:

 Hi folks,

 I am trying to run Hive from eclipse. I've set it up correctly and it is
 building the jars and stuff. However I face execeptions when I try to run
 hive queries like show tables etc. There  has been a discussion on this
 in
 the mailing list previously but there was no solution provided. It runs
 perfectly from command line .

 I am making a few changes to the hive source and every time I need to jar
 it
 from the command line and run it .Is there some way to run it directly from
 eclipse?

 Please help,

 Thanks,
 JS



[jira] [Updated] (HIVE-1360) Allow UDFs to access constant parameter values at compile time

2011-08-11 Thread Jonathan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Chang updated HIVE-1360:
-

Attachment: HIVE-1360.patch

 Allow UDFs to access constant parameter values at compile time
 --

 Key: HIVE-1360
 URL: https://issues.apache.org/jira/browse/HIVE-1360
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1360.patch


 UDFs should be able to access constant parameter values at compile time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1360) Allow UDFs to access constant parameter values at compile time

2011-08-11 Thread Jonathan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Chang updated HIVE-1360:
-

Status: Patch Available  (was: Open)

HIVE-1360. It has been a long-standing request for UDFs to be able to
Edit
access parameter values. This not only enables significant performance
improvement possibilities, it also allows for fundamentally richer
behavior, such as allowing the output type of a UDF to depend on its
inputs.

The strategy in this diff is to introduce the notion of a
ConstantObjectInspector, like a regular ObjectInspector except that it
encapsulates a constant value and knows what this constant value is.
These COIs are created through a factory method by ExprNodeConstantDesc
during plan generation hence UDFs will be able to capture these constant
values during the initialize phase. Furthermore, because these
ConstantObjectInspectors are simply subinterfaces of ObjectInspector,
UDFs which are not constant-aware receive ObjectInspectors which
also implement the same interfaces they are used to, so no special
handling needs to be done for existing UDFs.

An example UDF which uses this new functionality is also included in
this diff. NAMED_STRUCT is like STRUCT except that it also allows users
to specify the names of the fields of the struct, something previously
not possible because the names of the fields must be known at compile
time.

Also see this pull request: https://github.com/apache/hive/pull/2

 Allow UDFs to access constant parameter values at compile time
 --

 Key: HIVE-1360
 URL: https://issues.apache.org/jira/browse/HIVE-1360
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1360.patch


 UDFs should be able to access constant parameter values at compile time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-1360) Allow UDFs to access constant parameter values at compile time

2011-08-11 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-1360:


Assignee: Jonathan Chang  (was: Carl Steinbach)

Reassigning to Jonathan.

@Jonathan: Can you please upload the patch again and this time click the box 
that gives license rights to the Apache Foundation? Thanks!

 Allow UDFs to access constant parameter values at compile time
 --

 Key: HIVE-1360
 URL: https://issues.apache.org/jira/browse/HIVE-1360
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Jonathan Chang
 Attachments: HIVE-1360.patch


 UDFs should be able to access constant parameter values at compile time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread Tim Armstrong (JIRA)
Improve RCFileCat performance significantly
---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor


The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 MB/s 
of compressed RCFile.  We can implement much faster version to enable faster 
export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: HIVE-2242: DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions

2011-08-11 Thread Sohan Jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1475/
---

Review request for hive and Paul Yang.


Summary
---

Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
partitions that have a full specification to Pre Execution hooks. It should 
also include all matches from partial specifications.

E.g., suppose you have a table
create table test_table (a string) partitioned by (p1 string, p2 string);
alter table test_table add partition (p1=1, p2=1);
alter table test_table add partition (p1=1, p2=2);
alter table test_table add partition (p1=2, p2=2);

and you run 
alter table test_table drop partition(p1=1);
Pre-execution hooks will not be passed any of the partitions. The expected 
behavior is for pre-execution hooks to get the WriteEntity's with the 
partitions p1=1/p2=1 and p1=1/p2=2


This addresses bug HIVE-2242.
https://issues.apache.org/jira/browse/HIVE-2242


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1140399 

Diff: https://reviews.apache.org/r/1475/diff


Testing
---


Thanks,

Sohan



[jira] [Commented] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083738#comment-13083738
 ] 

jirapos...@reviews.apache.org commented on HIVE-2242:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1475/
---

Review request for hive and Paul Yang.


Summary
---

Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
partitions that have a full specification to Pre Execution hooks. It should 
also include all matches from partial specifications.

E.g., suppose you have a table
create table test_table (a string) partitioned by (p1 string, p2 string);
alter table test_table add partition (p1=1, p2=1);
alter table test_table add partition (p1=1, p2=2);
alter table test_table add partition (p1=2, p2=2);

and you run 
alter table test_table drop partition(p1=1);
Pre-execution hooks will not be passed any of the partitions. The expected 
behavior is for pre-execution hooks to get the WriteEntity's with the 
partitions p1=1/p2=1 and p1=1/p2=2


This addresses bug HIVE-2242.
https://issues.apache.org/jira/browse/HIVE-2242


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1140399 

Diff: https://reviews.apache.org/r/1475/diff


Testing
---


Thanks,

Sohan



 DDL Semantic Analyzer does not pass partial specification partitions to 
 PreExecute hooks when dropping partitions
 -

 Key: HIVE-2242
 URL: https://issues.apache.org/jira/browse/HIVE-2242
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2242.1.patch


 Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
 partitions that have a full specification to Pre Execution hooks.  It should 
 also include all matches from partial specifications.
 E.g., suppose you have a table
 {{create table test_table (a string) partitioned by (p1 string, p2 string);}}
 {{alter table test_table add partition (p1=1, p2=1);}}
 {{alter table test_table add partition (p1=1, p2=2);}}
 {{alter table test_table add partition (p1=2, p2=2);}}
 and you run 
 {{alter table test_table drop partition(p1=1);}}
 Pre-execution hooks will not be passed any of the partitions.  The expected 
 behavior is for pre-execution hooks to get the WriteEntity's with the 
 partitions p1=1/p2=1 and p1=1/p2=2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-11 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083751#comment-13083751
 ] 

Paul Yang commented on HIVE-2246:
-

There has been some issues identified with this patch. We will be doing some 
additional testing, but we might rollback so that we don't leave trunk in an 
unstable state.

 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: rcfilecat16x performance improvement

2011-08-11 Thread Tim Armstrong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Summary
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



Re: Review Request: rcfilecat 16x performance improvement

2011-08-11 Thread Tim Armstrong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

(Updated 2011-08-11 22:44:48.620762)


Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Summary (updated)
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



[jira] [Updated] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated HIVE-2370:


Status: Patch Available  (was: Open)

 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083765#comment-13083765
 ] 

jirapos...@reviews.apache.org commented on HIVE-2370:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Summary
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated HIVE-2370:


Attachment: rcfilecat_2011-08-11.patch

 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083766#comment-13083766
 ] 

Tim Armstrong commented on HIVE-2370:
-

Diff is available on reviewboard:
https://reviews.apache.org/r/1474/

 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083768#comment-13083768
 ] 

jirapos...@reviews.apache.org commented on HIVE-2370:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

(Updated 2011-08-11 22:44:48.620762)


Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Summary (updated)
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: rcfilecat 16x performance improvement

2011-08-11 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/#review1412
---


Great job! Does this number indicate the read and write speed or just the read 
(including decompression) part? 


trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3266

can you remove all these TABs?



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3267

make 2048 a static constant variable. 


- Ning


On 2011-08-11 22:44:48, Tim Armstrong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1474/
 ---
 
 (Updated 2011-08-11 22:44:48)
 
 
 Review request for hive, Yongqiang He, Ning Zhang, and namit jain.
 
 
 Summary
 ---
 
 This patch improves rcfilecat performance enormously: throughput increased 
 from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of 
 improvements I made to get to this performance:
 
 Initial:
 0.32 MB/s
 
 Change System.out to use bigger buffer (not line buffered)
 1.7MB/s
 
 Unchecked Get:
 1.75MB/s
 
 Use StringBuilder to construct each row before writing output.
 3.7MB/s
 
 Streamline decoding:
 4.16 MB/s
 
 Use StringBuilder to buffer multiple lines:
 5 MB/s
 
 Tuning buffer sizes:
 5.15 MB/s
 
 
 I also added a --verbose mode which writes progress updates to stderr.
 
 
 This addresses bug HIVE-2370.
 https://issues.apache.org/jira/browse/HIVE-2370
 
 
 Diffs
 -
 
   trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 
 
 Diff: https://reviews.apache.org/r/1474/diff
 
 
 Testing
 ---
 
 Used diff to check output was same as old version of RCFileCat
 
 
 Thanks,
 
 Tim
 




Re: Review Request: Optimisation for RCFile reading to improve CPU usage.

2011-08-11 Thread Tim Armstrong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1309/
---

(Updated 2011-08-11 23:07:05.774106)


Review request for hive, Yongqiang He and Ning Zhang.


Changes
---

Minor change to avoid a compilation problem.


Summary
---

By tweaking the RCFile$Reader implementation to allow more efficient memory 
access I was able to reduce CPU usage. I measured the speed required to scan a 
gzipped RCFile, decompress and assemble into records. CPU time was reduced by 
about 7% for a full table scan, An improvement of about 2% was realised when a 
smaller subset of columns (3-5 out of tens) were selected.


This addresses bug HIVE-2350.
https://issues.apache.org/jira/browse/HIVE-2350


Diffs (updated)
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1156839 

Diff: https://reviews.apache.org/r/1309/diff


Testing
---

Ran TestRCFile unit test.  Manually tested reading from warehouse table.


Thanks,

Tim



[jira] [Commented] (HIVE-2350) Improve RCFile Read Speed

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083793#comment-13083793
 ] 

jirapos...@reviews.apache.org commented on HIVE-2350:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1309/
---

(Updated 2011-08-11 23:07:05.774106)


Review request for hive, Yongqiang He and Ning Zhang.


Changes
---

Minor change to avoid a compilation problem.


Summary
---

By tweaking the RCFile$Reader implementation to allow more efficient memory 
access I was able to reduce CPU usage. I measured the speed required to scan a 
gzipped RCFile, decompress and assemble into records. CPU time was reduced by 
about 7% for a full table scan, An improvement of about 2% was realised when a 
smaller subset of columns (3-5 out of tens) were selected.


This addresses bug HIVE-2350.
https://issues.apache.org/jira/browse/HIVE-2350


Diffs (updated)
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1156839 

Diff: https://reviews.apache.org/r/1309/diff


Testing
---

Ran TestRCFile unit test.  Manually tested reading from warehouse table.


Thanks,

Tim



 Improve RCFile Read Speed
 -

 Key: HIVE-2350
 URL: https://issues.apache.org/jira/browse/HIVE-2350
 Project: Hive
  Issue Type: Improvement
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfile-2011-08-04.diff, rcfile_opt_2011-08-05.diff, 
 rcfile_opt_2011-08-05b.diff, rcfile_opt_2011-08-11.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 By tweaking the RCFile$Reader implementation to allow more efficient memory 
 access I was able to reduce CPU usage.  I measured the speed required to scan 
 a gzipped RCFile, decompress and assemble into records.  CPU time was reduced 
 by about 7% for a full table scan,  An improvement of about 2% was realised 
 when a smaller subset of columns (3-5 out of tens) were selected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083792#comment-13083792
 ] 

jirapos...@reviews.apache.org commented on HIVE-2370:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/#review1412
---


Great job! Does this number indicate the read and write speed or just the read 
(including decompression) part? 


trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3266

can you remove all these TABs?



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3267

make 2048 a static constant variable. 


- Ning


On 2011-08-11 22:44:48, Tim Armstrong wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1474/
bq.  ---
bq.  
bq.  (Updated 2011-08-11 22:44:48)
bq.  
bq.  
bq.  Review request for hive, Yongqiang He, Ning Zhang, and namit jain.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch improves rcfilecat performance enormously: throughput increased 
from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of 
improvements I made to get to this performance:
bq.  
bq.  Initial:
bq.  0.32 MB/s
bq.  
bq.  Change System.out to use bigger buffer (not line buffered)
bq.  1.7MB/s
bq.  
bq.  Unchecked Get:
bq.  1.75MB/s
bq.  
bq.  Use StringBuilder to construct each row before writing output.
bq.  3.7MB/s
bq.  
bq.  Streamline decoding:
bq.  4.16 MB/s
bq.  
bq.  Use StringBuilder to buffer multiple lines:
bq.  5 MB/s
bq.  
bq.  Tuning buffer sizes:
bq.  5.15 MB/s
bq.  
bq.  
bq.  I also added a --verbose mode which writes progress updates to stderr.
bq.  
bq.  
bq.  This addresses bug HIVE-2370.
bq.  https://issues.apache.org/jira/browse/HIVE-2370
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 
bq.  
bq.  Diff: https://reviews.apache.org/r/1474/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Used diff to check output was same as old version of RCFileCat
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Tim
bq.  
bq.



 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2350) Improve RCFile Read Speed

2011-08-11 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated HIVE-2350:


Attachment: rcfile_opt_2011-08-11.patch

 Improve RCFile Read Speed
 -

 Key: HIVE-2350
 URL: https://issues.apache.org/jira/browse/HIVE-2350
 Project: Hive
  Issue Type: Improvement
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfile-2011-08-04.diff, rcfile_opt_2011-08-05.diff, 
 rcfile_opt_2011-08-05b.diff, rcfile_opt_2011-08-11.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 By tweaking the RCFile$Reader implementation to allow more efficient memory 
 access I was able to reduce CPU usage.  I measured the speed required to scan 
 a gzipped RCFile, decompress and assemble into records.  CPU time was reduced 
 by about 7% for a full table scan,  An improvement of about 2% was realised 
 when a smaller subset of columns (3-5 out of tens) were selected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1360) Allow UDFs to access constant parameter values at compile time

2011-08-11 Thread Jonathan Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Chang updated HIVE-1360:
-

Attachment: HIVE-1360.patch

Like previous except license is now granted and also fixes show_functions.q.

 Allow UDFs to access constant parameter values at compile time
 --

 Key: HIVE-1360
 URL: https://issues.apache.org/jira/browse/HIVE-1360
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Affects Versions: 0.5.0
Reporter: Carl Steinbach
Assignee: Jonathan Chang
 Attachments: HIVE-1360.patch, HIVE-1360.patch


 UDFs should be able to access constant parameter values at compile time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Support archiving for multiple partitions if the table is partitioned by multiple columns

2011-08-11 Thread Marcin Kurczych

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1259/
---

(Updated 2011-08-11 23:31:47.018762)


Review request for hive, Paul Yang and namit jain.


Changes
---

Fixed configuration (removed hook)


Summary
---

Allowing archiving at chosen level. When table is partitioned by ds, hr, min it 
allows archiving at ds level, hr level and min level. Corresponding syntaxes 
are:
ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08');
ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11');
ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11', min='30');

You cannot do much to archived partitions. You can read them. You cannot write 
to them / overwrite them. You can drop single archived partitions, but not 
parts of bigger archives.


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1153271 
  trunk/metastore/if/hive_metastore.thrift 1153271 
  trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h 1153271 
  trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp 1153271 
  
trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Constants.java
 1153271 
  
trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php
 1153271 
  trunk/metastore/src/gen/thrift/gen-py/hive_metastore/constants.py 1153271 
  trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb 1153271 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ArchiveUtils.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1153271 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/DummyPartition.java 
1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
1153271 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1153271 
  trunk/ql/src/test/queries/clientnegative/archive_insert1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_insert2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_insert3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_insert4.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_multi1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_multi2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_multi3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_multi4.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_multi5.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_multi6.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_multi7.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_partspec1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_partspec2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/archive_partspec3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/archive_corrupt.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/archive_multi.q PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive1.q.out 1153271 
  trunk/ql/src/test/results/clientnegative/archive2.q.out 1153271 
  trunk/ql/src/test/results/clientnegative/archive_insert1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_insert2.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_insert3.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_insert4.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_multi1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_multi2.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_multi3.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_multi4.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_multi5.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_multi6.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_multi7.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_partspec1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_partspec2.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/archive_partspec3.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/archive_corrupt.q.out PRE-CREATION 
 

[jira] [Updated] (HIVE-2278) Support archiving for multiple partitions if the table is partitioned by multiple columns

2011-08-11 Thread Marcin Kurczych (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Kurczych updated HIVE-2278:
--

Attachment: HIVE-2278.6.patch

 Support archiving for multiple partitions if the table is partitioned by 
 multiple columns
 -

 Key: HIVE-2278
 URL: https://issues.apache.org/jira/browse/HIVE-2278
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: Marcin Kurczych
 Attachments: HIVE-2278.2.patch, HIVE-2278.3.patch, HIVE-2278.4.patch, 
 HIVE-2278.5.patch, HIVE-2278.5.patch, HIVE-2278.6.patch, hive.2278.1.patch


 If a table is partitioned by ds,hr
 it should be possible to archive all the files in ds to reduce the number of 
 files

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2278) Support archiving for multiple partitions if the table is partitioned by multiple columns

2011-08-11 Thread Marcin Kurczych (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Kurczych updated HIVE-2278:
--

Status: Open  (was: Patch Available)

 Support archiving for multiple partitions if the table is partitioned by 
 multiple columns
 -

 Key: HIVE-2278
 URL: https://issues.apache.org/jira/browse/HIVE-2278
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: Marcin Kurczych
 Attachments: HIVE-2278.2.patch, HIVE-2278.3.patch, HIVE-2278.4.patch, 
 HIVE-2278.5.patch, HIVE-2278.5.patch, HIVE-2278.6.patch, hive.2278.1.patch


 If a table is partitioned by ds,hr
 it should be possible to archive all the files in ds to reduce the number of 
 files

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2278) Support archiving for multiple partitions if the table is partitioned by multiple columns

2011-08-11 Thread Marcin Kurczych (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Kurczych updated HIVE-2278:
--

Status: Patch Available  (was: Open)

 Support archiving for multiple partitions if the table is partitioned by 
 multiple columns
 -

 Key: HIVE-2278
 URL: https://issues.apache.org/jira/browse/HIVE-2278
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: Marcin Kurczych
 Attachments: HIVE-2278.2.patch, HIVE-2278.3.patch, HIVE-2278.4.patch, 
 HIVE-2278.5.patch, HIVE-2278.5.patch, HIVE-2278.6.patch, hive.2278.1.patch


 If a table is partitioned by ds,hr
 it should be possible to archive all the files in ds to reduce the number of 
 files

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




HIVE-2282

2011-08-11 Thread John Sichi
The unit test sample_islocalmode_hook.q has been failing consistently for me 
with the diff below.  Jenkins builds passed with it so I'm guessing it must be 
something environmental?

Also, Siying, it looks like you committed this, but did not resolve it in JIRA?



[junit] 97,98c97,98
[junit]  PREHOOK: Output: 
file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-11_16-34-43_442_7150692818727736391/-mr-1
[junit]  1028
[junit] ---
[junit]  PREHOOK: Output: 
file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1
[junit]  0



RE: HIVE-2282

2011-08-11 Thread Siying Dong
Kevin, probably there are still some non-deterministic in your test case. Can 
you careful examine it?

-Original Message-
From: John Sichi 
Sent: Thursday, August 11, 2011 4:44 PM
To: Siying Dong; Kevin Wilfong
Cc: dev@hive.apache.org
Subject: HIVE-2282

The unit test sample_islocalmode_hook.q has been failing consistently for me 
with the diff below.  Jenkins builds passed with it so I'm guessing it must be 
something environmental?

Also, Siying, it looks like you committed this, but did not resolve it in JIRA?



[junit] 97,98c97,98
[junit]  PREHOOK: Output: 
file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-11_16-34-43_442_7150692818727736391/-mr-1
[junit]  1028
[junit] ---
[junit]  PREHOOK: Output: 
file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1
[junit]  0



Re: HIVE-2282

2011-08-11 Thread Kevin Wilfong
I've seen this issue before, a fix was going to go out as part of someone
else's change, but for some reason it hasn't been committed.  I've
arranged to remove the fix from that change, and I'll make a new JIRA just
for the fix.

On 8/11/11 4:49 PM, Siying Dong siyin...@fb.com wrote:

Kevin, probably there are still some non-deterministic in your test case.
Can you careful examine it?

-Original Message-
From: John Sichi 
Sent: Thursday, August 11, 2011 4:44 PM
To: Siying Dong; Kevin Wilfong
Cc: dev@hive.apache.org
Subject: HIVE-2282

The unit test sample_islocalmode_hook.q has been failing consistently for
me with the diff below.  Jenkins builds passed with it so I'm guessing it
must be something environmental?

Also, Siying, it looks like you committed this, but did not resolve it in
JIRA?



[junit] 97,98c97,98
[junit]  PREHOOK: Output:
file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-1
1_16-34-43_442_7150692818727736391/-mr-1
[junit]  1028
[junit] ---
[junit]  PREHOOK: Output:
file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/
hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1
[junit]  0




Re: HIVE-2282

2011-08-11 Thread John Sichi
Thanks!

JVS

On Aug 11, 2011, at 4:53 PM, Kevin Wilfong wrote:

 I've seen this issue before, a fix was going to go out as part of someone
 else's change, but for some reason it hasn't been committed.  I've
 arranged to remove the fix from that change, and I'll make a new JIRA just
 for the fix.
 
 On 8/11/11 4:49 PM, Siying Dong siyin...@fb.com wrote:
 
 Kevin, probably there are still some non-deterministic in your test case.
 Can you careful examine it?
 
 -Original Message-
 From: John Sichi 
 Sent: Thursday, August 11, 2011 4:44 PM
 To: Siying Dong; Kevin Wilfong
 Cc: dev@hive.apache.org
 Subject: HIVE-2282
 
 The unit test sample_islocalmode_hook.q has been failing consistently for
 me with the diff below.  Jenkins builds passed with it so I'm guessing it
 must be something environmental?
 
 Also, Siying, it looks like you committed this, but did not resolve it in
 JIRA?
 
 
 
   [junit] 97,98c97,98
   [junit]  PREHOOK: Output:
 file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-1
 1_16-34-43_442_7150692818727736391/-mr-1
   [junit]  1028
   [junit] ---
   [junit]  PREHOOK: Output:
 file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/
 hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1
   [junit]  0
 
 



[jira] [Created] (HIVE-2371) sample_islocalmode_hook.q test is non-deterministic

2011-08-11 Thread Kevin Wilfong (JIRA)
sample_islocalmode_hook.q test is non-deterministic
---

 Key: HIVE-2371
 URL: https://issues.apache.org/jira/browse/HIVE-2371
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2371) sample_islocalmode_hook.q test is non-deterministic

2011-08-11 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-2371:


Attachment: HIVE-2371.1.patch.txt

 sample_islocalmode_hook.q test is non-deterministic
 ---

 Key: HIVE-2371
 URL: https://issues.apache.org/jira/browse/HIVE-2371
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2371.1.patch.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: sample_islocalmode_hook.q test is non-deterministic

2011-08-11 Thread Kevin Wilfong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1477/
---

Review request for hive and Siying Dong.


Summary
---

Adding order by to the two queries used to create the test tables makes the 
test deterministic.


This addresses bug HIVE-2371.
https://issues.apache.org/jira/browse/HIVE-2371


Diffs
-

  trunk/ql/src/test/queries/clientpositive/sample_islocalmode_hook.q 1156861 
  trunk/ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out 
1156861 

Diff: https://reviews.apache.org/r/1477/diff


Testing
---

I ran the test and verified it passed.

I also had a person who had been seeing the test fail do to non-determinism run 
the test and verify that it passed.


Thanks,

Kevin



[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083835#comment-13083835
 ] 

Tim Armstrong commented on HIVE-2370:
-

I'm not sure exactly what you mean about read and write speeds.

I tested it reading a file off a remote DFS instance, redirecting the output to 
a local file.  The time spend writing the output is negligible.  

The largest part of time is spent doing unicode conversions to get it into a 
Java CharBuffer, and then writing it to the console.  Decompression and 
deserialisation also takes up a large part of CPU time.


 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2371) sample_islocalmode_hook.q test is non-deterministic

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083837#comment-13083837
 ] 

jirapos...@reviews.apache.org commented on HIVE-2371:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1477/
---

Review request for hive and Siying Dong.


Summary
---

Adding order by to the two queries used to create the test tables makes the 
test deterministic.


This addresses bug HIVE-2371.
https://issues.apache.org/jira/browse/HIVE-2371


Diffs
-

  trunk/ql/src/test/queries/clientpositive/sample_islocalmode_hook.q 1156861 
  trunk/ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out 
1156861 

Diff: https://reviews.apache.org/r/1477/diff


Testing
---

I ran the test and verified it passed.

I also had a person who had been seeing the test fail do to non-determinism run 
the test and verify that it passed.


Thanks,

Kevin



 sample_islocalmode_hook.q test is non-deterministic
 ---

 Key: HIVE-2371
 URL: https://issues.apache.org/jira/browse/HIVE-2371
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2371.1.patch.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: rcfilecat 16x performance improvement

2011-08-11 Thread Tim Armstrong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

(Updated 2011-08-12 00:22:11.295461)


Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Changes
---

Stripped out whitespace at end of line of old version.


Summary
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs (updated)
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083839#comment-13083839
 ] 

jirapos...@reviews.apache.org commented on HIVE-2370:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

(Updated 2011-08-12 00:22:11.295461)


Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Changes
---

Stripped out whitespace at end of line of old version.


Summary
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs (updated)
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1538) FilterOperator is applied twice with ppd on.

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083847#comment-13083847
 ] 

Hudson commented on HIVE-1538:
--

Integrated in Hive-trunk-h0.21 #889 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/889/])
HIVE-1538. filter is removed due to regression of HIVE-1538
(Amareshwari Sriramadasu via jvs)

jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156787
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
* /hive/trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out
* /hive/trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java


 FilterOperator is applied twice with ppd on.
 

 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.8.0

 Attachments: patch-1538-1.txt, patch-1538-2.txt, patch-1538-3.txt, 
 patch-1538-4.txt, patch-1538.txt


 With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
 seems second operator is always filtering zero rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes

2011-08-11 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083854#comment-13083854
 ] 

Paul Yang commented on HIVE-2322:
-

+1. Tested and will commit.

 Add ColumnarSerDe to the list of native SerDes
 --

 Key: HIVE-2322
 URL: https://issues.apache.org/jira/browse/HIVE-2322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, 
 HIVE-2322.4.patch, HIVE-2322.5.patch


 We store metadata about ColumnarSerDes in the metastore, so it should be 
 considered a native SerDe.  Then, column information can be retrieved from 
 the metastore instead of from deserialization.
 Currently, for non-native SerDes, column comments are only shown as from 
 deserializer.  Adding ColumnarSerDe to the list of native SerDes will 
 persist column comments.  See HIVE-2171 for persisting the column comments of 
 custom SerDes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2156) Improve error messages emitted during task execution

2011-08-11 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2156:
-

   Resolution: Fixed
Fix Version/s: 0.8.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Syed!

 Improve error messages emitted during task execution
 

 Key: HIVE-2156
 URL: https://issues.apache.org/jira/browse/HIVE-2156
 Project: Hive
  Issue Type: Improvement
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Fix For: 0.8.0

 Attachments: HIVE-2156.1.patch, HIVE-2156.10.patch, 
 HIVE-2156.11.patch, HIVE-2156.12.patch, HIVE-2156.13.patch, 
 HIVE-2156.2.patch, HIVE-2156.4.patch, HIVE-2156.8.patch, HIVE-2156.9.patch


 Follow-up to HIVE-1731
 A number of issues were related to reporting errors from task execution and 
 surfacing these in a more useful form.
 Currently a cryptic message with Execution Error and a return code and 
 class name of the task is emitted.
 The most useful log messages here are emitted to the local logs, which can be 
 found through jobtracker. Having either a pointer to these logs as part of 
 the error message or the actual content would improve the usefulness 
 substantially. It may also warrant looking into how the underlying error 
 reporting through Hadoop is done and if more information can be propagated up 
 from there.
 Specific issues raised in  HIVE-1731:
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 * issue was in regexp_extract syntax
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 * tried: desc table_does_not_exist;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes

2011-08-11 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083865#comment-13083865
 ] 

Paul Yang commented on HIVE-2322:
-

Committed. Thanks Sohan!

 Add ColumnarSerDe to the list of native SerDes
 --

 Key: HIVE-2322
 URL: https://issues.apache.org/jira/browse/HIVE-2322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, 
 HIVE-2322.4.patch, HIVE-2322.5.patch


 We store metadata about ColumnarSerDes in the metastore, so it should be 
 considered a native SerDe.  Then, column information can be retrieved from 
 the metastore instead of from deserialization.
 Currently, for non-native SerDes, column comments are only shown as from 
 deserializer.  Adding ColumnarSerDe to the list of native SerDes will 
 persist column comments.  See HIVE-2171 for persisting the column comments of 
 custom SerDes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: rcfilecat 16x performance improvement

2011-08-11 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/#review1414
---



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3271

This should probably be done after we finish processing the command line 
options.



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3269

1024*1024 should be replaced with a static final variable.



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3270

Another constant that should be converted to a static final.


- Carl


On 2011-08-12 00:22:11, Tim Armstrong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/1474/
 ---
 
 (Updated 2011-08-12 00:22:11)
 
 
 Review request for hive, Yongqiang He, Ning Zhang, and namit jain.
 
 
 Summary
 ---
 
 This patch improves rcfilecat performance enormously: throughput increased 
 from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of 
 improvements I made to get to this performance:
 
 Initial:
 0.32 MB/s
 
 Change System.out to use bigger buffer (not line buffered)
 1.7MB/s
 
 Unchecked Get:
 1.75MB/s
 
 Use StringBuilder to construct each row before writing output.
 3.7MB/s
 
 Streamline decoding:
 4.16 MB/s
 
 Use StringBuilder to buffer multiple lines:
 5 MB/s
 
 Tuning buffer sizes:
 5.15 MB/s
 
 
 I also added a --verbose mode which writes progress updates to stderr.
 
 
 This addresses bug HIVE-2370.
 https://issues.apache.org/jira/browse/HIVE-2370
 
 
 Diffs
 -
 
   trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 
 
 Diff: https://reviews.apache.org/r/1474/diff
 
 
 Testing
 ---
 
 Used diff to check output was same as old version of RCFileCat
 
 
 Thanks,
 
 Tim
 




[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083870#comment-13083870
 ] 

jirapos...@reviews.apache.org commented on HIVE-2370:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/#review1414
---



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3271

This should probably be done after we finish processing the command line 
options.



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3269

1024*1024 should be replaced with a static final variable.



trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
https://reviews.apache.org/r/1474/#comment3270

Another constant that should be converted to a static final.


- Carl


On 2011-08-12 00:22:11, Tim Armstrong wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1474/
bq.  ---
bq.  
bq.  (Updated 2011-08-12 00:22:11)
bq.  
bq.  
bq.  Review request for hive, Yongqiang He, Ning Zhang, and namit jain.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch improves rcfilecat performance enormously: throughput increased 
from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of 
improvements I made to get to this performance:
bq.  
bq.  Initial:
bq.  0.32 MB/s
bq.  
bq.  Change System.out to use bigger buffer (not line buffered)
bq.  1.7MB/s
bq.  
bq.  Unchecked Get:
bq.  1.75MB/s
bq.  
bq.  Use StringBuilder to construct each row before writing output.
bq.  3.7MB/s
bq.  
bq.  Streamline decoding:
bq.  4.16 MB/s
bq.  
bq.  Use StringBuilder to buffer multiple lines:
bq.  5 MB/s
bq.  
bq.  Tuning buffer sizes:
bq.  5.15 MB/s
bq.  
bq.  
bq.  I also added a --verbose mode which writes progress updates to stderr.
bq.  
bq.  
bq.  This addresses bug HIVE-2370.
bq.  https://issues.apache.org/jira/browse/HIVE-2370
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 
bq.  
bq.  Diff: https://reviews.apache.org/r/1474/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Used diff to check output was same as old version of RCFileCat
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Tim
bq.  
bq.



 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1989) recognize transitivity of predicates on join keys

2011-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083887#comment-13083887
 ] 

John Sichi commented on HIVE-1989:
--

Usually when we add a new optimization, we add a corresponding conf parameter 
so that we can disable it if it causes trouble.  Add it to HiveConf.java and 
conf/hive-default.xml

 recognize transitivity of predicates on join keys
 -

 Key: HIVE-1989
 URL: https://issues.apache.org/jira/browse/HIVE-1989
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen
 Attachments: HIVE-1989v1.patch


 Given
 {noformat}
 set hive.mapred.mode=strict;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 select count(*) from invites join invites2 on invites.ds=invites2.ds where 
 invites.ds='2011-01-01';
 {noformat}
 currently an error occurs:
 {noformat}
 Error in semantic analysis: No Partition Predicate Found for Alias invites2 
 Table invites2
 {noformat}
 The optimizer should be able to infer a predicate on invites2 via 
 transitivity.  The current lack places a burden on the user to add a 
 redundant predicate, and makes impossible (at least in strict mode) join 
 views where both underlying tables are partitioned (the join select list has 
 to pick one of the tables arbitrarily).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: rcfilecat 16x performance improvement

2011-08-11 Thread Tim Armstrong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

(Updated 2011-08-12 03:56:39.298860)


Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Changes
---

Turned magic numbers into named constants, enable output buffering only after 
arguments processed.


Summary
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs (updated)
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



[jira] [Updated] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated HIVE-2370:


Attachment: rcfilecat_2011-08-11b.patch

 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch, rcfilecat_2011-08-11b.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly

2011-08-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083903#comment-13083903
 ] 

jirapos...@reviews.apache.org commented on HIVE-2370:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1474/
---

(Updated 2011-08-12 03:56:39.298860)


Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Changes
---

Turned magic numbers into named constants, enable output buffering only after 
arguments processed.


Summary
---

This patch improves rcfilecat performance enormously: throughput increased from 
0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements 
I made to get to this performance:

Initial:
0.32 MB/s

Change System.out to use bigger buffer (not line buffered)
1.7MB/s

Unchecked Get:
1.75MB/s

Use StringBuilder to construct each row before writing output.
3.7MB/s

Streamline decoding:
4.16 MB/s

Use StringBuilder to buffer multiple lines:
5 MB/s

Tuning buffer sizes:
5.15 MB/s


I also added a --verbose mode which writes progress updates to stderr.


This addresses bug HIVE-2370.
https://issues.apache.org/jira/browse/HIVE-2370


Diffs (updated)
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 

Diff: https://reviews.apache.org/r/1474/diff


Testing
---

Used diff to check output was same as old version of RCFileCat


Thanks,

Tim



 Improve RCFileCat performance significantly
 ---

 Key: HIVE-2370
 URL: https://issues.apache.org/jira/browse/HIVE-2370
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.8.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Minor
 Attachments: rcfilecat_2011-08-11.patch, rcfilecat_2011-08-11b.patch


 The rcfilecat utility is extraordinarily slow: the throughput can be  0.5 
 MB/s of compressed RCFile.  We can implement much faster version to enable 
 faster export of data from Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2171) Allow custom serdes to set field comments

2011-08-11 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083907#comment-13083907
 ] 

John Sichi commented on HIVE-2171:
--

+1.  Will commit when tests pass.


 Allow custom serdes to set field comments
 -

 Key: HIVE-2171
 URL: https://issues.apache.org/jira/browse/HIVE-2171
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: HIVE-2171-2.patch, HIVE-2171.patch


 Currently, while serde implementations can set a field's name, they can't set 
 its comment.  These are set in the metastore utils to {{(from 
 deserializer)}}.  For those serdes that can provide meaningful comments for a 
 field, they should be propagated to the table description.  These 
 serde-provided comments could be prepended to (from deserializer) if others 
 feel that's a meaningful distinction.  This change involves updating 
 {{StructField}} to support a (possibly null) comment field and then 
 propagating this change out to the myriad places {{StructField}} is thrown 
 around.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2156) Improve error messages emitted during task execution

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083953#comment-13083953
 ] 

Hudson commented on HIVE-2156:
--

Integrated in Hive-trunk-h0.21 #890 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/890/])
HIVE-2156. Improve error messages emitted during task execution (Syed S. 
Albiz via Ning Zhang)

nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156928
Files : 
* /hive/trunk/ql/src/test/templates/TestNegativeCliDriver.vm
* /hive/trunk/ql/src/test/results/clientnegative/script_broken_pipe1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/script_error.q.out
* /hive/trunk/ql/src/test/results/clientnegative/script_broken_pipe2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/script_broken_pipe3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/minimr_broken_pipe.q.out
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/results/clientpositive/mapjoin_hook.q.out
* /hive/trunk/conf/hive-default.xml
* /hive/trunk/ql/build.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
* /hive/trunk/ql/src/test/results/clientnegative/dyn_part3.q.out
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
* /hive/trunk/ql/src/test/results/clientnegative/udf_test_error.q.out
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/test/queries/clientnegative/minimr_broken_pipe.q
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
* /hive/trunk/ql/src/test/results/clientnegative/index_compact_size_limit.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join25.q.out
* 
/hive/trunk/contrib/src/test/results/clientnegative/case_with_row_sequence.q.out
* /hive/trunk/ql/src/test/results/clientnegative/udf_test_error_reduce.q.out
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java
* /hive/trunk/ql/src/test/results/clientnegative/index_compact_entry_limit.q.out
* /hive/trunk/ql/src/test/results/clientnegative/udf_reflect_neg.q.out
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
* /hive/trunk/build-common.xml


 Improve error messages emitted during task execution
 

 Key: HIVE-2156
 URL: https://issues.apache.org/jira/browse/HIVE-2156
 Project: Hive
  Issue Type: Improvement
Reporter: Syed S. Albiz
Assignee: Syed S. Albiz
 Fix For: 0.8.0

 Attachments: HIVE-2156.1.patch, HIVE-2156.10.patch, 
 HIVE-2156.11.patch, HIVE-2156.12.patch, HIVE-2156.13.patch, 
 HIVE-2156.2.patch, HIVE-2156.4.patch, HIVE-2156.8.patch, HIVE-2156.9.patch


 Follow-up to HIVE-1731
 A number of issues were related to reporting errors from task execution and 
 surfacing these in a more useful form.
 Currently a cryptic message with Execution Error and a return code and 
 class name of the task is emitted.
 The most useful log messages here are emitted to the local logs, which can be 
 found through jobtracker. Having either a pointer to these logs as part of 
 the error message or the actual content would improve the usefulness 
 substantially. It may also warrant looking into how the underlying error 
 reporting through Hadoop is done and if more information can be propagated up 
 from there.
 Specific issues raised in  HIVE-1731:
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 * issue was in regexp_extract syntax
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 * tried: desc table_does_not_exist;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083954#comment-13083954
 ] 

Hudson commented on HIVE-2322:
--

Integrated in Hive-trunk-h0.21 #890 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/890/])
HIVE-2322. Add ColumnarSerDe to the list of native SerDes (Sohan Jain via 
pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156931
Files : 
* /hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/combine3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rcfile_default_format.q.out
* /hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_8.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/alter_partition_format_loc.q.out
* /hive/trunk/ql/src/test/results/clientpositive/index_compact_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rcfile_bigdata.q.out
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java
* /hive/trunk/ql/src/test/results/clientpositive/index_compact_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge_stats.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out
* /hive/trunk/ql/src/test/results/clientpositive/index_bitmap_rc.q.out
* /hive/trunk/ql/src/test/results/clientpositive/rcfile_columnar.q.out
* /hive/trunk/ql/src/test/results/clientpositive/index_creation.q.out
* /hive/trunk/ql/src/test/results/clientpositive/create_1.q.out
* /hive/trunk/ql/src/test/queries/clientpositive/sample_islocalmode_hook.q
* 
/hive/trunk/ql/src/test/results/clientpositive/columnarserde_create_shortcut.q.out


 Add ColumnarSerDe to the list of native SerDes
 --

 Key: HIVE-2322
 URL: https://issues.apache.org/jira/browse/HIVE-2322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, 
 HIVE-2322.4.patch, HIVE-2322.5.patch


 We store metadata about ColumnarSerDes in the metastore, so it should be 
 considered a native SerDe.  Then, column information can be retrieved from 
 the metastore instead of from deserialization.
 Currently, for non-native SerDes, column comments are only shown as from 
 deserializer.  Adding ColumnarSerDe to the list of native SerDes will 
 persist column comments.  See HIVE-2171 for persisting the column comments of 
 custom SerDes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira