[jira] [Updated] (HIVE-2110) Hive Client is indefenitely waiting for reading from Socket
[ https://issues.apache.org/jira/browse/HIVE-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-2110: -- Fix Version/s: 0.8.0 Status: Patch Available (was: Open) Hive Client is indefenitely waiting for reading from Socket --- Key: HIVE-2110 URL: https://issues.apache.org/jira/browse/HIVE-2110 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.5.0 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Prasad Mujumdar Fix For: 0.8.0 Hive Client is indefenitely waiting for reading from Socket. Thread dump i added below. Cause is: In the HiveClient, when client socket is created, the read timeout is mentioned is 0. So the socket will indefinetly wait when the machine where Hive Server is running is shutdown or network is unplugged. The same may not happen if the HiveServer alone is killed or gracefully shutdown. At this time, client will get connection reset exception. Code in HiveConnection --- {noformat} transport = new TSocket(host, port); TProtocol protocol = new TBinaryProtocol(transport); client = new HiveClient(protocol); {noformat} In the Client side, they send the query and wait for the response send_execute(query,id); recv_execute(); // place where client waiting is initiated Thread dump: {noformat} main prio=10 tid=0x40111000 nid=0x3641 runnable [0x7f0d73f29000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) locked 0x7f0d5d3f0828 (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) at org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:130) at org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:109) locked 0x7f0d5d3f0878 (a org.apache.thrift.transport.TSocket) at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:218) at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:154) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2110) Hive Client is indefenitely waiting for reading from Socket
[ https://issues.apache.org/jira/browse/HIVE-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-2110: -- Attachment: HIVE-2110.patch Hive Client is indefenitely waiting for reading from Socket --- Key: HIVE-2110 URL: https://issues.apache.org/jira/browse/HIVE-2110 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.5.0 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Prasad Mujumdar Fix For: 0.8.0 Attachments: HIVE-2110.patch Hive Client is indefenitely waiting for reading from Socket. Thread dump i added below. Cause is: In the HiveClient, when client socket is created, the read timeout is mentioned is 0. So the socket will indefinetly wait when the machine where Hive Server is running is shutdown or network is unplugged. The same may not happen if the HiveServer alone is killed or gracefully shutdown. At this time, client will get connection reset exception. Code in HiveConnection --- {noformat} transport = new TSocket(host, port); TProtocol protocol = new TBinaryProtocol(transport); client = new HiveClient(protocol); {noformat} In the Client side, they send the query and wait for the response send_execute(query,id); recv_execute(); // place where client waiting is initiated Thread dump: {noformat} main prio=10 tid=0x40111000 nid=0x3641 runnable [0x7f0d73f29000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) locked 0x7f0d5d3f0828 (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:125) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) at org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:130) at org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:109) locked 0x7f0d5d3f0878 (a org.apache.thrift.transport.TSocket) at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:218) at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:154) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-trunk-h0.21 #887
See https://builds.apache.org/job/Hive-trunk-h0.21/887/changes
[jira] [Commented] (HIVE-2346) Add hooks to run when execution fails.
[ https://issues.apache.org/jira/browse/HIVE-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083013#comment-13083013 ] Hudson commented on HIVE-2346: -- Integrated in Hive-trunk-h0.21 #887 (See [https://builds.apache.org/job/Hive-trunk-h0.21/887/]) HIVE-2346. Add hooks to run when execution fails. (Kevin Wilfong via Ning Zhang) nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156480 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java * /hive/trunk/conf/hive-default.xml Add hooks to run when execution fails. -- Key: HIVE-2346 URL: https://issues.apache.org/jira/browse/HIVE-2346 Project: Hive Issue Type: Improvement Reporter: Kevin Wilfong Assignee: Kevin Wilfong Fix For: 0.8.0 Attachments: HIVE-2346.1.patch.txt, HIVE-2346.2.patch.txt, HIVE-2346.3.patch.txt Currently, when a query fails, the Post Execution Hooks are not run. Adding hooks to be run when a query fails could allow for better logging etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam reassigned HIVE-2181: -- Assignee: Chinna Rao Lalam Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Labels: patch Fix For: 0.8.0 Attachments: HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Minor typo in error message
Hello, I'm new here and I just wanted to report a minor typo issue in HiveConnection.java (jdbc) : throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01); It seems like there's a i missing ;) Hive is a nice work BTW ! Thanks. -- *Clément **Notin*
Re: Minor typo in error message
Hey Clément- Thanks for the report - would you be able to open a JIRA for it (https://issues.apache.org/jira/browse/HIVE)? I'm sure someone will whip up a patch shortly, or if you're interested in contributing, I'd invite you to create one. Thanks, Jakob 2011/8/11 Notin clement.no...@gmail.com: Hello, I'm new here and I just wanted to report a minor typo issue in HiveConnection.java (jdbc) : throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01); It seems like there's a i missing ;) Hive is a nice work BTW ! Thanks. -- *Clément **Notin*
[jira] [Updated] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table
[ https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2315: --- Attachment: (was: HIVE-2315_part2.patch) DatabaseMetadata.getColumns() does not return partition column names for a table Key: HIVE-2315 URL: https://issues.apache.org/jira/browse/HIVE-2315 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1 Reporter: Mythili Gopalakrishnan Assignee: Patrick Hunt Priority: Critical Fix For: 0.8.0 Attachments: HIVE-2315.patch getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return the partition column names. Where as from HIVE CLI, if you do a 'describe tablename' you get all columns including the partition columns. It would be nice if getColumns() method returns all columns. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table
[ https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2315: --- Attachment: HIVE-2315.patch Updated to a single patch with fix and both sets of tests. DatabaseMetadata.getColumns() does not return partition column names for a table Key: HIVE-2315 URL: https://issues.apache.org/jira/browse/HIVE-2315 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1 Reporter: Mythili Gopalakrishnan Assignee: Patrick Hunt Priority: Critical Fix For: 0.8.0 Attachments: HIVE-2315.patch, HIVE-2315.patch getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return the partition column names. Where as from HIVE CLI, if you do a 'describe tablename' you get all columns including the partition columns. It would be nice if getColumns() method returns all columns. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2315 DatabaseMetadata.getColumns() does not return partition column names for a table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1468/ --- Review request for hive and Carl Steinbach. Summary --- This patch fixes the problem and adds a couple of tests. This addresses bug HIVE-2315. https://issues.apache.org/jira/browse/HIVE-2315 Diffs - jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java d570fca jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java d72cf43 Diff: https://reviews.apache.org/r/1468/diff Testing --- units pass, a user also verified it fixed the issue they were seeing. Thanks, Patrick
[jira] [Updated] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table
[ https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HIVE-2315: --- Status: Patch Available (was: Open) DatabaseMetadata.getColumns() does not return partition column names for a table Key: HIVE-2315 URL: https://issues.apache.org/jira/browse/HIVE-2315 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1 Reporter: Mythili Gopalakrishnan Assignee: Patrick Hunt Priority: Critical Fix For: 0.8.0 Attachments: HIVE-2315.patch, HIVE-2315.patch getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return the partition column names. Where as from HIVE CLI, if you do a 'describe tablename' you get all columns including the partition columns. It would be nice if getColumns() method returns all columns. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2315) DatabaseMetadata.getColumns() does not return partition column names for a table
[ https://issues.apache.org/jira/browse/HIVE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083257#comment-13083257 ] jirapos...@reviews.apache.org commented on HIVE-2315: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1468/ --- Review request for hive and Carl Steinbach. Summary --- This patch fixes the problem and adds a couple of tests. This addresses bug HIVE-2315. https://issues.apache.org/jira/browse/HIVE-2315 Diffs - jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java d570fca jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java d72cf43 Diff: https://reviews.apache.org/r/1468/diff Testing --- units pass, a user also verified it fixed the issue they were seeing. Thanks, Patrick DatabaseMetadata.getColumns() does not return partition column names for a table Key: HIVE-2315 URL: https://issues.apache.org/jira/browse/HIVE-2315 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1 Reporter: Mythili Gopalakrishnan Assignee: Patrick Hunt Priority: Critical Fix For: 0.8.0 Attachments: HIVE-2315.patch, HIVE-2315.patch getColumns() method of DatabaseMetadata for HIVE JDBC Driver does not return the partition column names. Where as from HIVE CLI, if you do a 'describe tablename' you get all columns including the partition columns. It would be nice if getColumns() method returns all columns. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Minor typo in error message
It's done. HIVE-2369 https://issues.apache.org/jira/browse/HIVE-2369 (I didn't have the will to create an account, it's done now !) 2011/8/11 Jakob Homan jgho...@gmail.com Hey Clément- Thanks for the report - would you be able to open a JIRA for it (https://issues.apache.org/jira/browse/HIVE)? I'm sure someone will whip up a patch shortly, or if you're interested in contributing, I'd invite you to create one. Thanks, Jakob 2011/8/11 Notin clement.no...@gmail.com: Hello, I'm new here and I just wanted to report a minor typo issue in HiveConnection.java (jdbc) : throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01); It seems like there's a i missing ;) Hive is a nice work BTW ! Thanks. -- *Clément **Notin* -- *Clément **Notin*
[jira] [Created] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)
Minor typo in error message in HiveConnection.java (JDBC) - Key: HIVE-2369 URL: https://issues.apache.org/jira/browse/HIVE-2369 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1, 0.8.0 Environment: Linux Reporter: Clément Notin Priority: Trivial There is a minor typo issue in HiveConnection.java (jdbc) : {code}throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01);{code} It seems like there's a i missing. I know it's a very minor typo but I report it anyway. I won't attach a patch because it would be too long for me to SVN checkout just for 1 letter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)
[ https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clément Notin updated HIVE-2369: Status: Open (was: Patch Available) Minor typo in error message in HiveConnection.java (JDBC) - Key: HIVE-2369 URL: https://issues.apache.org/jira/browse/HIVE-2369 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1, 0.8.0 Environment: Linux Reporter: Clément Notin Priority: Trivial Original Estimate: 2m Remaining Estimate: 2m There is a minor typo issue in HiveConnection.java (jdbc) : {code}throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01);{code} It seems like there's a i missing. I know it's a very minor typo but I report it anyway. I won't attach a patch because it would be too long for me to SVN checkout just for 1 letter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)
[ https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clément Notin updated HIVE-2369: Status: Patch Available (was: Open) Easy patch Minor typo in error message in HiveConnection.java (JDBC) - Key: HIVE-2369 URL: https://issues.apache.org/jira/browse/HIVE-2369 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1, 0.8.0 Environment: Linux Reporter: Clément Notin Priority: Trivial Original Estimate: 2m Remaining Estimate: 2m There is a minor typo issue in HiveConnection.java (jdbc) : {code}throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01);{code} It seems like there's a i missing. I know it's a very minor typo but I report it anyway. I won't attach a patch because it would be too long for me to SVN checkout just for 1 letter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)
[ https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083279#comment-13083279 ] Clément Notin commented on HIVE-2369: - I wrote the patch on GitHub and made a pull request. You can get it there. Minor typo in error message in HiveConnection.java (JDBC) - Key: HIVE-2369 URL: https://issues.apache.org/jira/browse/HIVE-2369 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.7.1, 0.8.0 Environment: Linux Reporter: Clément Notin Priority: Trivial Original Estimate: 2m Remaining Estimate: 2m There is a minor typo issue in HiveConnection.java (jdbc) : {code}throw new SQLException(Could not establish connecton to + uri + : + e.getMessage(), 08S01);{code} It seems like there's a i missing. I know it's a very minor typo but I report it anyway. I won't attach a patch because it would be too long for me to SVN checkout just for 1 letter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083306#comment-13083306 ] MIS commented on HIVE-2181: --- -1 for the issue. What if I'm running multiple hive servers on different port in the same machine {With my metastore db on a mysql server}, then if one of the server instances restarts, it would end up deleting the scratch dir, which would affect other running instances as well. Even if we specify different scratch dir for each of the instances, I doubt about the value add from this property. Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Labels: patch Fix For: 0.8.0 Attachments: HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[hive] Minor typo connecton - connection. (#3)
Fixes HIVE-2369 -- Reply to this email directly or view it on GitHub: https://github.com/apache/hive/pull/3
[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.
[ https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083401#comment-13083401 ] Chinna Rao Lalam commented on HIVE-2181: Hi MIS, Thanks for the point as u said if we have multiple instances with the same scratch dir on same machine it wont help. But in this case if we give different value for the scratch dir it may help(I will double check this point). I will introduce one propety for this like hive.start.cleanup.scrachdir . This cleanup can trigger based on this property value. By default it will be turned off. If cleanup need to do while starting the server turn on. Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. Key: HIVE-2181 URL: https://issues.apache.org/jira/browse/HIVE-2181 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Suse linux, Hadoop 20.1, Hive 0.8 Reporter: sanoj mathew Assignee: Chinna Rao Lalam Priority: Minor Labels: patch Fix For: 0.8.0 Attachments: HIVE-2181.patch Original Estimate: 48h Remaining Estimate: 48h Now queries leaves the map outputs under scratch.dir after execution. If the hive server is stopped we need not keep the stopped server's map oputputs. So whle starting the server we can clear the scratch.dir. This can help in improved disk usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2344) filter is removed due to regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2344: - Resolution: Fixed Release Note: When predicate pushdown is enabled, Hive would previously incorrectly push down predicates on non-deterministic function invocations when those were indirectly referenced via a nested SELECT list rather than directly in the filter expression. After this change, Hive no longer pushes down filters over indirect references to function invocations of any kind (regardless of determinism). Note that in Hive, even builtin operators such as + and CAST are treated as function invocations. Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Amareshwari! filter is removed due to regression of HIVE-1538 Key: HIVE-2344 URL: https://issues.apache.org/jira/browse/HIVE-2344 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: He Yongqiang Assignee: Amareshwari Sriramadasu Fix For: 0.8.0 Attachments: hive-patch-2344-2.txt, hive-patch-2344.txt, ppd_udf_col.q.out.txt select * from ( select type_bucket,randum123 from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a where randum123 =0.1)s where s.randum1230.1 limit 20; This is returning results... and explain select type_bucket,randum123 from (SELECT *, cast(rand() as double) AS randum123 FROM tbl where ds = ...) a where randum123 =0.1 shows that there is no filter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
HIVE-1538/HIVE-2344
I just committed the fix from Amareshwari, so after this gets pushed, it should be possible to back out the conf changes which were applied to avoid the bug from HIVE-1538. Read the release notes I added on HIVE-2344 and chime in with a new JIRA issue if you think there are cases where it's important to do finer discrimination in what kinds of SELECT expressions to allow for ppd...in general it's a cost-based optimizer problem. As an example, consider select * from (select f(x,y) as z from t) s where z 3; Before HIVE-1538, there was a bug where we would push down f(x,y)3 even when f was non-deterministic. HIVE-1538 made that bug much more obvious. HIVE-2344 fixes it, but also prevents the pushdown even in cases where f is deterministic. This is good in some cases (e.g. when f is expensive to compute and the filter selectivity is poor), but could be bad in others (e.g. when f is something simple like a CAST and the filter is highly selective). JVS
Re: HIVE-1538/HIVE-2344
On Aug 11, 2011, at 1:21 PM, wrote: I just committed the fix from Amareshwari, so after this gets pushed, it should be possible to back out the conf changes which were applied to avoid the bug from HIVE-1538. (Oops, ignore this part...no conf changes were applied in Hive source, so this was Facebook-specific.) JVS
Running Hive from Eclipse
Hi folks, I am trying to run Hive from eclipse. I've set it up correctly and it is building the jars and stuff. However I face execeptions when I try to run hive queries like show tables etc. There has been a discussion on this in the mailing list previously but there was no solution provided. It runs perfectly from command line . I am making a few changes to the hive source and every time I need to jar it from the command line and run it .Is there some way to run it directly from eclipse? Please help, Thanks, JS
Re: Running Hive from Eclipse
Hi John, Can you please include the error messages/exceptions that you're encountering? Thanks. Carl On Thu, Aug 11, 2011 at 1:40 PM, john smith js1987.sm...@gmail.com wrote: Hi folks, I am trying to run Hive from eclipse. I've set it up correctly and it is building the jars and stuff. However I face execeptions when I try to run hive queries like show tables etc. There has been a discussion on this in the mailing list previously but there was no solution provided. It runs perfectly from command line . I am making a few changes to the hive source and every time I need to jar it from the command line and run it .Is there some way to run it directly from eclipse? Please help, Thanks, JS
[jira] [Updated] (HIVE-1360) Allow UDFs to access constant parameter values at compile time
[ https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Chang updated HIVE-1360: - Attachment: HIVE-1360.patch Allow UDFs to access constant parameter values at compile time -- Key: HIVE-1360 URL: https://issues.apache.org/jira/browse/HIVE-1360 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Affects Versions: 0.5.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-1360.patch UDFs should be able to access constant parameter values at compile time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1360) Allow UDFs to access constant parameter values at compile time
[ https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Chang updated HIVE-1360: - Status: Patch Available (was: Open) HIVE-1360. It has been a long-standing request for UDFs to be able to Edit access parameter values. This not only enables significant performance improvement possibilities, it also allows for fundamentally richer behavior, such as allowing the output type of a UDF to depend on its inputs. The strategy in this diff is to introduce the notion of a ConstantObjectInspector, like a regular ObjectInspector except that it encapsulates a constant value and knows what this constant value is. These COIs are created through a factory method by ExprNodeConstantDesc during plan generation hence UDFs will be able to capture these constant values during the initialize phase. Furthermore, because these ConstantObjectInspectors are simply subinterfaces of ObjectInspector, UDFs which are not constant-aware receive ObjectInspectors which also implement the same interfaces they are used to, so no special handling needs to be done for existing UDFs. An example UDF which uses this new functionality is also included in this diff. NAMED_STRUCT is like STRUCT except that it also allows users to specify the names of the fields of the struct, something previously not possible because the names of the fields must be known at compile time. Also see this pull request: https://github.com/apache/hive/pull/2 Allow UDFs to access constant parameter values at compile time -- Key: HIVE-1360 URL: https://issues.apache.org/jira/browse/HIVE-1360 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Affects Versions: 0.5.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-1360.patch UDFs should be able to access constant parameter values at compile time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1360) Allow UDFs to access constant parameter values at compile time
[ https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reassigned HIVE-1360: Assignee: Jonathan Chang (was: Carl Steinbach) Reassigning to Jonathan. @Jonathan: Can you please upload the patch again and this time click the box that gives license rights to the Apache Foundation? Thanks! Allow UDFs to access constant parameter values at compile time -- Key: HIVE-1360 URL: https://issues.apache.org/jira/browse/HIVE-1360 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Affects Versions: 0.5.0 Reporter: Carl Steinbach Assignee: Jonathan Chang Attachments: HIVE-1360.patch UDFs should be able to access constant parameter values at compile time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2370) Improve RCFileCat performance significantly
Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2242: DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1475/ --- Review request for hive and Paul Yang. Summary --- Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table create table test_table (a string) partitioned by (p1 string, p2 string); alter table test_table add partition (p1=1, p2=1); alter table test_table add partition (p1=1, p2=2); alter table test_table add partition (p1=2, p2=2); and you run alter table test_table drop partition(p1=1); Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 This addresses bug HIVE-2242. https://issues.apache.org/jira/browse/HIVE-2242 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1140399 Diff: https://reviews.apache.org/r/1475/diff Testing --- Thanks, Sohan
[jira] [Commented] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions
[ https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083738#comment-13083738 ] jirapos...@reviews.apache.org commented on HIVE-2242: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1475/ --- Review request for hive and Paul Yang. Summary --- Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table create table test_table (a string) partitioned by (p1 string, p2 string); alter table test_table add partition (p1=1, p2=1); alter table test_table add partition (p1=1, p2=2); alter table test_table add partition (p1=2, p2=2); and you run alter table test_table drop partition(p1=1); Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 This addresses bug HIVE-2242. https://issues.apache.org/jira/browse/HIVE-2242 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1140399 Diff: https://reviews.apache.org/r/1475/diff Testing --- Thanks, Sohan DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions - Key: HIVE-2242 URL: https://issues.apache.org/jira/browse/HIVE-2242 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2242.1.patch Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string);}} {{alter table test_table add partition (p1=1, p2=1);}} {{alter table test_table add partition (p1=1, p2=2);}} {{alter table test_table add partition (p1=2, p2=2);}} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083751#comment-13083751 ] Paul Yang commented on HIVE-2246: - There has been some issues identified with this patch. We will be doing some additional testing, but we might rollback so that we don't leave trunk in an unstable state. Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, HIVE-2246.8.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: rcfilecat16x performance improvement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim
Re: Review Request: rcfilecat 16x performance improvement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-11 22:44:48.620762) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Summary (updated) --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim
[jira] [Updated] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated HIVE-2370: Status: Patch Available (was: Open) Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083765#comment-13083765 ] jirapos...@reviews.apache.org commented on HIVE-2370: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated HIVE-2370: Attachment: rcfilecat_2011-08-11.patch Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083766#comment-13083766 ] Tim Armstrong commented on HIVE-2370: - Diff is available on reviewboard: https://reviews.apache.org/r/1474/ Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083768#comment-13083768 ] jirapos...@reviews.apache.org commented on HIVE-2370: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-11 22:44:48.620762) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Summary (updated) --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: rcfilecat 16x performance improvement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/#review1412 --- Great job! Does this number indicate the read and write speed or just the read (including decompression) part? trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3266 can you remove all these TABs? trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3267 make 2048 a static constant variable. - Ning On 2011-08-11 22:44:48, Tim Armstrong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-11 22:44:48) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim
Re: Review Request: Optimisation for RCFile reading to improve CPU usage.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1309/ --- (Updated 2011-08-11 23:07:05.774106) Review request for hive, Yongqiang He and Ning Zhang. Changes --- Minor change to avoid a compilation problem. Summary --- By tweaking the RCFile$Reader implementation to allow more efficient memory access I was able to reduce CPU usage. I measured the speed required to scan a gzipped RCFile, decompress and assemble into records. CPU time was reduced by about 7% for a full table scan, An improvement of about 2% was realised when a smaller subset of columns (3-5 out of tens) were selected. This addresses bug HIVE-2350. https://issues.apache.org/jira/browse/HIVE-2350 Diffs (updated) - trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1156839 Diff: https://reviews.apache.org/r/1309/diff Testing --- Ran TestRCFile unit test. Manually tested reading from warehouse table. Thanks, Tim
[jira] [Commented] (HIVE-2350) Improve RCFile Read Speed
[ https://issues.apache.org/jira/browse/HIVE-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083793#comment-13083793 ] jirapos...@reviews.apache.org commented on HIVE-2350: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1309/ --- (Updated 2011-08-11 23:07:05.774106) Review request for hive, Yongqiang He and Ning Zhang. Changes --- Minor change to avoid a compilation problem. Summary --- By tweaking the RCFile$Reader implementation to allow more efficient memory access I was able to reduce CPU usage. I measured the speed required to scan a gzipped RCFile, decompress and assemble into records. CPU time was reduced by about 7% for a full table scan, An improvement of about 2% was realised when a smaller subset of columns (3-5 out of tens) were selected. This addresses bug HIVE-2350. https://issues.apache.org/jira/browse/HIVE-2350 Diffs (updated) - trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1156839 Diff: https://reviews.apache.org/r/1309/diff Testing --- Ran TestRCFile unit test. Manually tested reading from warehouse table. Thanks, Tim Improve RCFile Read Speed - Key: HIVE-2350 URL: https://issues.apache.org/jira/browse/HIVE-2350 Project: Hive Issue Type: Improvement Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfile-2011-08-04.diff, rcfile_opt_2011-08-05.diff, rcfile_opt_2011-08-05b.diff, rcfile_opt_2011-08-11.patch Original Estimate: 0h Remaining Estimate: 0h By tweaking the RCFile$Reader implementation to allow more efficient memory access I was able to reduce CPU usage. I measured the speed required to scan a gzipped RCFile, decompress and assemble into records. CPU time was reduced by about 7% for a full table scan, An improvement of about 2% was realised when a smaller subset of columns (3-5 out of tens) were selected. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083792#comment-13083792 ] jirapos...@reviews.apache.org commented on HIVE-2370: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/#review1412 --- Great job! Does this number indicate the read and write speed or just the read (including decompression) part? trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3266 can you remove all these TABs? trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3267 make 2048 a static constant variable. - Ning On 2011-08-11 22:44:48, Tim Armstrong wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1474/ bq. --- bq. bq. (Updated 2011-08-11 22:44:48) bq. bq. bq. Review request for hive, Yongqiang He, Ning Zhang, and namit jain. bq. bq. bq. Summary bq. --- bq. bq. This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: bq. bq. Initial: bq. 0.32 MB/s bq. bq. Change System.out to use bigger buffer (not line buffered) bq. 1.7MB/s bq. bq. Unchecked Get: bq. 1.75MB/s bq. bq. Use StringBuilder to construct each row before writing output. bq. 3.7MB/s bq. bq. Streamline decoding: bq. 4.16 MB/s bq. bq. Use StringBuilder to buffer multiple lines: bq. 5 MB/s bq. bq. Tuning buffer sizes: bq. 5.15 MB/s bq. bq. bq. I also added a --verbose mode which writes progress updates to stderr. bq. bq. bq. This addresses bug HIVE-2370. bq. https://issues.apache.org/jira/browse/HIVE-2370 bq. bq. bq. Diffs bq. - bq. bq.trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 bq. bq. Diff: https://reviews.apache.org/r/1474/diff bq. bq. bq. Testing bq. --- bq. bq. Used diff to check output was same as old version of RCFileCat bq. bq. bq. Thanks, bq. bq. Tim bq. bq. Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2350) Improve RCFile Read Speed
[ https://issues.apache.org/jira/browse/HIVE-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated HIVE-2350: Attachment: rcfile_opt_2011-08-11.patch Improve RCFile Read Speed - Key: HIVE-2350 URL: https://issues.apache.org/jira/browse/HIVE-2350 Project: Hive Issue Type: Improvement Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfile-2011-08-04.diff, rcfile_opt_2011-08-05.diff, rcfile_opt_2011-08-05b.diff, rcfile_opt_2011-08-11.patch Original Estimate: 0h Remaining Estimate: 0h By tweaking the RCFile$Reader implementation to allow more efficient memory access I was able to reduce CPU usage. I measured the speed required to scan a gzipped RCFile, decompress and assemble into records. CPU time was reduced by about 7% for a full table scan, An improvement of about 2% was realised when a smaller subset of columns (3-5 out of tens) were selected. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1360) Allow UDFs to access constant parameter values at compile time
[ https://issues.apache.org/jira/browse/HIVE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Chang updated HIVE-1360: - Attachment: HIVE-1360.patch Like previous except license is now granted and also fixes show_functions.q. Allow UDFs to access constant parameter values at compile time -- Key: HIVE-1360 URL: https://issues.apache.org/jira/browse/HIVE-1360 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Affects Versions: 0.5.0 Reporter: Carl Steinbach Assignee: Jonathan Chang Attachments: HIVE-1360.patch, HIVE-1360.patch UDFs should be able to access constant parameter values at compile time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Support archiving for multiple partitions if the table is partitioned by multiple columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1259/ --- (Updated 2011-08-11 23:31:47.018762) Review request for hive, Paul Yang and namit jain. Changes --- Fixed configuration (removed hook) Summary --- Allowing archiving at chosen level. When table is partitioned by ds, hr, min it allows archiving at ds level, hr level and min level. Corresponding syntaxes are: ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08'); ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11'); ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11', min='30'); You cannot do much to archived partitions. You can read them. You cannot write to them / overwrite them. You can drop single archived partitions, but not parts of bigger archives. Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1153271 trunk/metastore/if/hive_metastore.thrift 1153271 trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h 1153271 trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp 1153271 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Constants.java 1153271 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php 1153271 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/constants.py 1153271 trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb 1153271 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ArchiveUtils.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/DummyPartition.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1153271 trunk/ql/src/test/queries/clientnegative/archive_insert1.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_insert2.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_insert3.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_insert4.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi1.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi2.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi3.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi4.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi5.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi6.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_multi7.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_partspec1.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_partspec2.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/archive_partspec3.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/archive_corrupt.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/archive_multi.q PRE-CREATION trunk/ql/src/test/results/clientnegative/archive1.q.out 1153271 trunk/ql/src/test/results/clientnegative/archive2.q.out 1153271 trunk/ql/src/test/results/clientnegative/archive_insert1.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_insert2.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_insert3.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_insert4.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi1.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi2.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi3.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi4.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi5.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi6.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_multi7.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_partspec1.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_partspec2.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/archive_partspec3.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/archive_corrupt.q.out PRE-CREATION
[jira] [Updated] (HIVE-2278) Support archiving for multiple partitions if the table is partitioned by multiple columns
[ https://issues.apache.org/jira/browse/HIVE-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Kurczych updated HIVE-2278: -- Attachment: HIVE-2278.6.patch Support archiving for multiple partitions if the table is partitioned by multiple columns - Key: HIVE-2278 URL: https://issues.apache.org/jira/browse/HIVE-2278 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: Marcin Kurczych Attachments: HIVE-2278.2.patch, HIVE-2278.3.patch, HIVE-2278.4.patch, HIVE-2278.5.patch, HIVE-2278.5.patch, HIVE-2278.6.patch, hive.2278.1.patch If a table is partitioned by ds,hr it should be possible to archive all the files in ds to reduce the number of files -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2278) Support archiving for multiple partitions if the table is partitioned by multiple columns
[ https://issues.apache.org/jira/browse/HIVE-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Kurczych updated HIVE-2278: -- Status: Open (was: Patch Available) Support archiving for multiple partitions if the table is partitioned by multiple columns - Key: HIVE-2278 URL: https://issues.apache.org/jira/browse/HIVE-2278 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: Marcin Kurczych Attachments: HIVE-2278.2.patch, HIVE-2278.3.patch, HIVE-2278.4.patch, HIVE-2278.5.patch, HIVE-2278.5.patch, HIVE-2278.6.patch, hive.2278.1.patch If a table is partitioned by ds,hr it should be possible to archive all the files in ds to reduce the number of files -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2278) Support archiving for multiple partitions if the table is partitioned by multiple columns
[ https://issues.apache.org/jira/browse/HIVE-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Kurczych updated HIVE-2278: -- Status: Patch Available (was: Open) Support archiving for multiple partitions if the table is partitioned by multiple columns - Key: HIVE-2278 URL: https://issues.apache.org/jira/browse/HIVE-2278 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: Marcin Kurczych Attachments: HIVE-2278.2.patch, HIVE-2278.3.patch, HIVE-2278.4.patch, HIVE-2278.5.patch, HIVE-2278.5.patch, HIVE-2278.6.patch, hive.2278.1.patch If a table is partitioned by ds,hr it should be possible to archive all the files in ds to reduce the number of files -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
HIVE-2282
The unit test sample_islocalmode_hook.q has been failing consistently for me with the diff below. Jenkins builds passed with it so I'm guessing it must be something environmental? Also, Siying, it looks like you committed this, but did not resolve it in JIRA? [junit] 97,98c97,98 [junit] PREHOOK: Output: file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-11_16-34-43_442_7150692818727736391/-mr-1 [junit] 1028 [junit] --- [junit] PREHOOK: Output: file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1 [junit] 0
RE: HIVE-2282
Kevin, probably there are still some non-deterministic in your test case. Can you careful examine it? -Original Message- From: John Sichi Sent: Thursday, August 11, 2011 4:44 PM To: Siying Dong; Kevin Wilfong Cc: dev@hive.apache.org Subject: HIVE-2282 The unit test sample_islocalmode_hook.q has been failing consistently for me with the diff below. Jenkins builds passed with it so I'm guessing it must be something environmental? Also, Siying, it looks like you committed this, but did not resolve it in JIRA? [junit] 97,98c97,98 [junit] PREHOOK: Output: file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-11_16-34-43_442_7150692818727736391/-mr-1 [junit] 1028 [junit] --- [junit] PREHOOK: Output: file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1 [junit] 0
Re: HIVE-2282
I've seen this issue before, a fix was going to go out as part of someone else's change, but for some reason it hasn't been committed. I've arranged to remove the fix from that change, and I'll make a new JIRA just for the fix. On 8/11/11 4:49 PM, Siying Dong siyin...@fb.com wrote: Kevin, probably there are still some non-deterministic in your test case. Can you careful examine it? -Original Message- From: John Sichi Sent: Thursday, August 11, 2011 4:44 PM To: Siying Dong; Kevin Wilfong Cc: dev@hive.apache.org Subject: HIVE-2282 The unit test sample_islocalmode_hook.q has been failing consistently for me with the diff below. Jenkins builds passed with it so I'm guessing it must be something environmental? Also, Siying, it looks like you committed this, but did not resolve it in JIRA? [junit] 97,98c97,98 [junit] PREHOOK: Output: file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-1 1_16-34-43_442_7150692818727736391/-mr-1 [junit] 1028 [junit] --- [junit] PREHOOK: Output: file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/ hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1 [junit] 0
Re: HIVE-2282
Thanks! JVS On Aug 11, 2011, at 4:53 PM, Kevin Wilfong wrote: I've seen this issue before, a fix was going to go out as part of someone else's change, but for some reason it hasn't been committed. I've arranged to remove the fix from that change, and I'll make a new JIRA just for the fix. On 8/11/11 4:49 PM, Siying Dong siyin...@fb.com wrote: Kevin, probably there are still some non-deterministic in your test case. Can you careful examine it? -Original Message- From: John Sichi Sent: Thursday, August 11, 2011 4:44 PM To: Siying Dong; Kevin Wilfong Cc: dev@hive.apache.org Subject: HIVE-2282 The unit test sample_islocalmode_hook.q has been failing consistently for me with the diff below. Jenkins builds passed with it so I'm guessing it must be something environmental? Also, Siying, it looks like you committed this, but did not resolve it in JIRA? [junit] 97,98c97,98 [junit] PREHOOK: Output: file:/data/users/jsichi/open/test-trunk/build/ql/scratchdir/hive_2011-08-1 1_16-34-43_442_7150692818727736391/-mr-1 [junit] 1028 [junit] --- [junit] PREHOOK: Output: file:/data/users/kevinwilfong/trunk/VENDOR.hive/trunk/build/ql/scratchdir/ hive_2011-07-22_10-31-04_069_8883954538684297085/-mr-1 [junit] 0
[jira] [Created] (HIVE-2371) sample_islocalmode_hook.q test is non-deterministic
sample_islocalmode_hook.q test is non-deterministic --- Key: HIVE-2371 URL: https://issues.apache.org/jira/browse/HIVE-2371 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2371) sample_islocalmode_hook.q test is non-deterministic
[ https://issues.apache.org/jira/browse/HIVE-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-2371: Attachment: HIVE-2371.1.patch.txt sample_islocalmode_hook.q test is non-deterministic --- Key: HIVE-2371 URL: https://issues.apache.org/jira/browse/HIVE-2371 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2371.1.patch.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: sample_islocalmode_hook.q test is non-deterministic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1477/ --- Review request for hive and Siying Dong. Summary --- Adding order by to the two queries used to create the test tables makes the test deterministic. This addresses bug HIVE-2371. https://issues.apache.org/jira/browse/HIVE-2371 Diffs - trunk/ql/src/test/queries/clientpositive/sample_islocalmode_hook.q 1156861 trunk/ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out 1156861 Diff: https://reviews.apache.org/r/1477/diff Testing --- I ran the test and verified it passed. I also had a person who had been seeing the test fail do to non-determinism run the test and verify that it passed. Thanks, Kevin
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083835#comment-13083835 ] Tim Armstrong commented on HIVE-2370: - I'm not sure exactly what you mean about read and write speeds. I tested it reading a file off a remote DFS instance, redirecting the output to a local file. The time spend writing the output is negligible. The largest part of time is spent doing unicode conversions to get it into a Java CharBuffer, and then writing it to the console. Decompression and deserialisation also takes up a large part of CPU time. Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2371) sample_islocalmode_hook.q test is non-deterministic
[ https://issues.apache.org/jira/browse/HIVE-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083837#comment-13083837 ] jirapos...@reviews.apache.org commented on HIVE-2371: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1477/ --- Review request for hive and Siying Dong. Summary --- Adding order by to the two queries used to create the test tables makes the test deterministic. This addresses bug HIVE-2371. https://issues.apache.org/jira/browse/HIVE-2371 Diffs - trunk/ql/src/test/queries/clientpositive/sample_islocalmode_hook.q 1156861 trunk/ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out 1156861 Diff: https://reviews.apache.org/r/1477/diff Testing --- I ran the test and verified it passed. I also had a person who had been seeing the test fail do to non-determinism run the test and verify that it passed. Thanks, Kevin sample_islocalmode_hook.q test is non-deterministic --- Key: HIVE-2371 URL: https://issues.apache.org/jira/browse/HIVE-2371 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2371.1.patch.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: rcfilecat 16x performance improvement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-12 00:22:11.295461) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Changes --- Stripped out whitespace at end of line of old version. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs (updated) - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083839#comment-13083839 ] jirapos...@reviews.apache.org commented on HIVE-2370: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-12 00:22:11.295461) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Changes --- Stripped out whitespace at end of line of old version. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs (updated) - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1538) FilterOperator is applied twice with ppd on.
[ https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083847#comment-13083847 ] Hudson commented on HIVE-1538: -- Integrated in Hive-trunk-h0.21 #889 (See [https://builds.apache.org/job/Hive-trunk-h0.21/889/]) HIVE-1538. filter is removed due to regression of HIVE-1538 (Amareshwari Sriramadasu via jvs) jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156787 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java * /hive/trunk/ql/src/test/results/clientpositive/ppd_udf_col.q.out * /hive/trunk/ql/src/test/queries/clientpositive/ppd_udf_col.q * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java FilterOperator is applied twice with ppd on. Key: HIVE-1538 URL: https://issues.apache.org/jira/browse/HIVE-1538 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.8.0 Attachments: patch-1538-1.txt, patch-1538-2.txt, patch-1538-3.txt, patch-1538-4.txt, patch-1538.txt With hive.optimize.ppd set to true, FilterOperator is applied twice. And it seems second operator is always filtering zero rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083854#comment-13083854 ] Paul Yang commented on HIVE-2322: - +1. Tested and will commit. Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, HIVE-2322.4.patch, HIVE-2322.5.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2156) Improve error messages emitted during task execution
[ https://issues.apache.org/jira/browse/HIVE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2156: - Resolution: Fixed Fix Version/s: 0.8.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Syed! Improve error messages emitted during task execution Key: HIVE-2156 URL: https://issues.apache.org/jira/browse/HIVE-2156 Project: Hive Issue Type: Improvement Reporter: Syed S. Albiz Assignee: Syed S. Albiz Fix For: 0.8.0 Attachments: HIVE-2156.1.patch, HIVE-2156.10.patch, HIVE-2156.11.patch, HIVE-2156.12.patch, HIVE-2156.13.patch, HIVE-2156.2.patch, HIVE-2156.4.patch, HIVE-2156.8.patch, HIVE-2156.9.patch Follow-up to HIVE-1731 A number of issues were related to reporting errors from task execution and surfacing these in a more useful form. Currently a cryptic message with Execution Error and a return code and class name of the task is emitted. The most useful log messages here are emitted to the local logs, which can be found through jobtracker. Having either a pointer to these logs as part of the error message or the actual content would improve the usefulness substantially. It may also warrant looking into how the underlying error reporting through Hadoop is done and if more information can be propagated up from there. Specific issues raised in HIVE-1731: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask * issue was in regexp_extract syntax FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask * tried: desc table_does_not_exist; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083865#comment-13083865 ] Paul Yang commented on HIVE-2322: - Committed. Thanks Sohan! Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, HIVE-2322.4.patch, HIVE-2322.5.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: rcfilecat 16x performance improvement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/#review1414 --- trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3271 This should probably be done after we finish processing the command line options. trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3269 1024*1024 should be replaced with a static final variable. trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3270 Another constant that should be converted to a static final. - Carl On 2011-08-12 00:22:11, Tim Armstrong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-12 00:22:11) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083870#comment-13083870 ] jirapos...@reviews.apache.org commented on HIVE-2370: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/#review1414 --- trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3271 This should probably be done after we finish processing the command line options. trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3269 1024*1024 should be replaced with a static final variable. trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java https://reviews.apache.org/r/1474/#comment3270 Another constant that should be converted to a static final. - Carl On 2011-08-12 00:22:11, Tim Armstrong wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1474/ bq. --- bq. bq. (Updated 2011-08-12 00:22:11) bq. bq. bq. Review request for hive, Yongqiang He, Ning Zhang, and namit jain. bq. bq. bq. Summary bq. --- bq. bq. This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: bq. bq. Initial: bq. 0.32 MB/s bq. bq. Change System.out to use bigger buffer (not line buffered) bq. 1.7MB/s bq. bq. Unchecked Get: bq. 1.75MB/s bq. bq. Use StringBuilder to construct each row before writing output. bq. 3.7MB/s bq. bq. Streamline decoding: bq. 4.16 MB/s bq. bq. Use StringBuilder to buffer multiple lines: bq. 5 MB/s bq. bq. Tuning buffer sizes: bq. 5.15 MB/s bq. bq. bq. I also added a --verbose mode which writes progress updates to stderr. bq. bq. bq. This addresses bug HIVE-2370. bq. https://issues.apache.org/jira/browse/HIVE-2370 bq. bq. bq. Diffs bq. - bq. bq.trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 bq. bq. Diff: https://reviews.apache.org/r/1474/diff bq. bq. bq. Testing bq. --- bq. bq. Used diff to check output was same as old version of RCFileCat bq. bq. bq. Thanks, bq. bq. Tim bq. bq. Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1989) recognize transitivity of predicates on join keys
[ https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083887#comment-13083887 ] John Sichi commented on HIVE-1989: -- Usually when we add a new optimization, we add a corresponding conf parameter so that we can disable it if it causes trouble. Add it to HiveConf.java and conf/hive-default.xml recognize transitivity of predicates on join keys - Key: HIVE-1989 URL: https://issues.apache.org/jira/browse/HIVE-1989 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Attachments: HIVE-1989v1.patch Given {noformat} set hive.mapred.mode=strict; create table invites (foo int, bar string) partitioned by (ds string); create table invites2 (foo int, bar string) partitioned by (ds string); select count(*) from invites join invites2 on invites.ds=invites2.ds where invites.ds='2011-01-01'; {noformat} currently an error occurs: {noformat} Error in semantic analysis: No Partition Predicate Found for Alias invites2 Table invites2 {noformat} The optimizer should be able to infer a predicate on invites2 via transitivity. The current lack places a burden on the user to add a redundant predicate, and makes impossible (at least in strict mode) join views where both underlying tables are partitioned (the join select list has to pick one of the tables arbitrarily). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: rcfilecat 16x performance improvement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-12 03:56:39.298860) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Changes --- Turned magic numbers into named constants, enable output buffering only after arguments processed. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs (updated) - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim
[jira] [Updated] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated HIVE-2370: Attachment: rcfilecat_2011-08-11b.patch Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch, rcfilecat_2011-08-11b.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2370) Improve RCFileCat performance significantly
[ https://issues.apache.org/jira/browse/HIVE-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083903#comment-13083903 ] jirapos...@reviews.apache.org commented on HIVE-2370: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1474/ --- (Updated 2011-08-12 03:56:39.298860) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Changes --- Turned magic numbers into named constants, enable output buffering only after arguments processed. Summary --- This patch improves rcfilecat performance enormously: throughput increased from 0.32MB/s to 5.15MB/s on one benchmark: 16x. There were a number of improvements I made to get to this performance: Initial: 0.32 MB/s Change System.out to use bigger buffer (not line buffered) 1.7MB/s Unchecked Get: 1.75MB/s Use StringBuilder to construct each row before writing output. 3.7MB/s Streamline decoding: 4.16 MB/s Use StringBuilder to buffer multiple lines: 5 MB/s Tuning buffer sizes: 5.15 MB/s I also added a --verbose mode which writes progress updates to stderr. This addresses bug HIVE-2370. https://issues.apache.org/jira/browse/HIVE-2370 Diffs (updated) - trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java 1156839 Diff: https://reviews.apache.org/r/1474/diff Testing --- Used diff to check output was same as old version of RCFileCat Thanks, Tim Improve RCFileCat performance significantly --- Key: HIVE-2370 URL: https://issues.apache.org/jira/browse/HIVE-2370 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.8.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Priority: Minor Attachments: rcfilecat_2011-08-11.patch, rcfilecat_2011-08-11b.patch The rcfilecat utility is extraordinarily slow: the throughput can be 0.5 MB/s of compressed RCFile. We can implement much faster version to enable faster export of data from Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2171) Allow custom serdes to set field comments
[ https://issues.apache.org/jira/browse/HIVE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083907#comment-13083907 ] John Sichi commented on HIVE-2171: -- +1. Will commit when tests pass. Allow custom serdes to set field comments - Key: HIVE-2171 URL: https://issues.apache.org/jira/browse/HIVE-2171 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HIVE-2171-2.patch, HIVE-2171.patch Currently, while serde implementations can set a field's name, they can't set its comment. These are set in the metastore utils to {{(from deserializer)}}. For those serdes that can provide meaningful comments for a field, they should be propagated to the table description. These serde-provided comments could be prepended to (from deserializer) if others feel that's a meaningful distinction. This change involves updating {{StructField}} to support a (possibly null) comment field and then propagating this change out to the myriad places {{StructField}} is thrown around. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2156) Improve error messages emitted during task execution
[ https://issues.apache.org/jira/browse/HIVE-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083953#comment-13083953 ] Hudson commented on HIVE-2156: -- Integrated in Hive-trunk-h0.21 #890 (See [https://builds.apache.org/job/Hive-trunk-h0.21/890/]) HIVE-2156. Improve error messages emitted during task execution (Syed S. Albiz via Ning Zhang) nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156928 Files : * /hive/trunk/ql/src/test/templates/TestNegativeCliDriver.vm * /hive/trunk/ql/src/test/results/clientnegative/script_broken_pipe1.q.out * /hive/trunk/ql/src/test/results/clientnegative/script_error.q.out * /hive/trunk/ql/src/test/results/clientnegative/script_broken_pipe2.q.out * /hive/trunk/ql/src/test/results/clientnegative/script_broken_pipe3.q.out * /hive/trunk/ql/src/test/results/clientnegative/minimr_broken_pipe.q.out * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/results/clientpositive/mapjoin_hook.q.out * /hive/trunk/conf/hive-default.xml * /hive/trunk/ql/build.xml * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java * /hive/trunk/ql/src/test/results/clientnegative/dyn_part3.q.out * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java * /hive/trunk/ql/src/test/results/clientnegative/udf_test_error.q.out * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/test/queries/clientnegative/minimr_broken_pipe.q * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java * /hive/trunk/ql/src/test/results/clientnegative/index_compact_size_limit.q.out * /hive/trunk/ql/src/test/results/clientpositive/auto_join25.q.out * /hive/trunk/contrib/src/test/results/clientnegative/case_with_row_sequence.q.out * /hive/trunk/ql/src/test/results/clientnegative/udf_test_error_reduce.q.out * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java * /hive/trunk/ql/src/test/results/clientnegative/index_compact_entry_limit.q.out * /hive/trunk/ql/src/test/results/clientnegative/udf_reflect_neg.q.out * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java * /hive/trunk/build-common.xml Improve error messages emitted during task execution Key: HIVE-2156 URL: https://issues.apache.org/jira/browse/HIVE-2156 Project: Hive Issue Type: Improvement Reporter: Syed S. Albiz Assignee: Syed S. Albiz Fix For: 0.8.0 Attachments: HIVE-2156.1.patch, HIVE-2156.10.patch, HIVE-2156.11.patch, HIVE-2156.12.patch, HIVE-2156.13.patch, HIVE-2156.2.patch, HIVE-2156.4.patch, HIVE-2156.8.patch, HIVE-2156.9.patch Follow-up to HIVE-1731 A number of issues were related to reporting errors from task execution and surfacing these in a more useful form. Currently a cryptic message with Execution Error and a return code and class name of the task is emitted. The most useful log messages here are emitted to the local logs, which can be found through jobtracker. Having either a pointer to these logs as part of the error message or the actual content would improve the usefulness substantially. It may also warrant looking into how the underlying error reporting through Hadoop is done and if more information can be propagated up from there. Specific issues raised in HIVE-1731: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask * issue was in regexp_extract syntax FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask * tried: desc table_does_not_exist; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083954#comment-13083954 ] Hudson commented on HIVE-2322: -- Integrated in Hive-trunk-h0.21 #890 (See [https://builds.apache.org/job/Hive-trunk-h0.21/890/]) HIVE-2322. Add ColumnarSerDe to the list of native SerDes (Sohan Jain via pauly) pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156931 Files : * /hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/combine3.q.out * /hive/trunk/ql/src/test/results/clientpositive/rcfile_default_format.q.out * /hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_8.q.out * /hive/trunk/ql/src/test/results/clientpositive/alter_partition_format_loc.q.out * /hive/trunk/ql/src/test/results/clientpositive/index_compact_2.q.out * /hive/trunk/ql/src/test/results/clientpositive/rcfile_bigdata.q.out * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java * /hive/trunk/ql/src/test/results/clientpositive/index_compact_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out * /hive/trunk/ql/src/test/results/clientpositive/alter_merge_stats.q.out * /hive/trunk/ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out * /hive/trunk/ql/src/test/results/clientpositive/index_bitmap_rc.q.out * /hive/trunk/ql/src/test/results/clientpositive/rcfile_columnar.q.out * /hive/trunk/ql/src/test/results/clientpositive/index_creation.q.out * /hive/trunk/ql/src/test/results/clientpositive/create_1.q.out * /hive/trunk/ql/src/test/queries/clientpositive/sample_islocalmode_hook.q * /hive/trunk/ql/src/test/results/clientpositive/columnarserde_create_shortcut.q.out Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, HIVE-2322.4.patch, HIVE-2322.5.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira