[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified
[ https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007366#comment-13007366 ] Bennie Schut commented on HIVE-2054: Yes it was this code block: try { File tmpFile = File.createTempFile(sessionID, .pipeout, tmpDir); tmpFile.deleteOnExit(); startSs.setTmpOutputFile(tmpFile); } catch (IOException e) { throw new RuntimeException(e); } So you are correct it's related to changes from HIVE-818. Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified - Key: HIVE-2054 URL: https://issues.apache.org/jira/browse/HIVE-2054 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2054.1.patch.txt It seems something recently changed on the jdbc driver which causes this IOException on windows. java.lang.RuntimeException: java.io.IOException: The system cannot find the path specified at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237) at org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73) at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-1815: --- Attachment: HIVE-1815.1.patch.txt This is the simplest implementation I could do. Just changed the fetchOne to fetchN and return the result on each next() call until the list is empty and then do another fetchN. We've used this for a week and the performance increase on large resultsets is significant. You could also do the fetchN on a different thread to keep the queue full but that's a bit more work for just a little more gain. I've added 1 small test to call the setFetchSize and getFetchSize but the jdbc tests should all work like they worked before this test since the functionality doesn't change. The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.5.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Attachments: HIVE-1815.1.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut updated HIVE-1815: --- Fix Version/s: 0.8.0 Affects Version/s: (was: 0.5.0) 0.8.0 Release Note: Use batch fetching on the hive jdbc driver to increase performance. Status: Patch Available (was: Reopened) The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.8.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Fix For: 0.8.0 Attachments: HIVE-1815.1.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1815) The class HiveResultSet should implement batch fetching.
[ https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007386#comment-13007386 ] Bennie Schut commented on HIVE-1815: https://reviews.apache.org/r/514/ The class HiveResultSet should implement batch fetching. Key: HIVE-1815 URL: https://issues.apache.org/jira/browse/HIVE-1815 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.8.0 Environment: Custom Java application using the Hive JDBC driver to connect to a Hive server, execute a Hive query and process the results. Reporter: Guy le Mar Fix For: 0.8.0 Attachments: HIVE-1815.1.patch.txt When using the Hive JDBC driver, you can execute a Hive query and obtain a HiveResultSet instance that contains the results of the query. Unfortunately, HiveResultSet can then only fetch a single row of these results from the Hive server at a time. As a consequence, it's extremely slow to fetch a resultset of anything other than a trivial size. It would be nice for the HiveResultSet to be able to fetch N rows from the server at a time, so that performance is suitable to support applications that provide human interaction. (From memory, I think it took me around 20 minutes to fetch 4000 rows.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1095) Hive in Maven
[ https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerrit Jansen van Vuuren updated HIVE-1095: --- Attachment: HIVE-1095.v4.PATCH fixed, Hive in Maven - Key: HIVE-1095 URL: https://issues.apache.org/jira/browse/HIVE-1095 Project: Hive Issue Type: Task Components: Build Infrastructure Affects Versions: 0.6.0 Reporter: Gerrit Jansen van Vuuren Priority: Minor Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, hiveReleasedToMaven.tar.gz Getting hive into maven main repositories Documentation on how to do this is on: http://maven.apache.org/guides/mini/guide-central-repository-upload.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: status of 0.7.0
Thanks for that info. Bill On Tue, Mar 15, 2011 at 6:28 PM, Carl Steinbach c...@cloudera.com wrote: Hi Bill, There are two open blocker tickets related to bugs in the metastore upgrade scripts (which are present in rc0). Once these are resolved we'll be ready to vote on a new release candidate. Thanks. Carl On Tue, Mar 15, 2011 at 7:08 AM, Bill Au bill.w...@gmail.com wrote: What's the status of 0.7.0? I noticed that rc0 was made available back on 2/18. But then there has been no vote on it at all. Is that save to use? Bill
Review Request: HIVE-1815: The class HiveResultSet should implement batch fetching.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/514/ --- Review request for hive. Summary --- HIVE-1815: The class HiveResultSet should implement batch fetching. This addresses bug HIVE-1815. https://issues.apache.org/jira/browse/HIVE-1815 Diffs - trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 1081785 trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1081785 trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1081785 Diff: https://reviews.apache.org/r/514/diff Testing --- Thanks, Bennie
[jira] Updated: (HIVE-2058) MySQL Upgrade scripts missing new defaults for two table's columns
[ https://issues.apache.org/jira/browse/HIVE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Tunney updated HIVE-2058: - Labels: (was: derby_triage10_5_2) MySQL Upgrade scripts missing new defaults for two table's columns -- Key: HIVE-2058 URL: https://issues.apache.org/jira/browse/HIVE-2058 Project: Hive Issue Type: Bug Components: Metastore Reporter: Stephen Tunney Priority: Blocker Upgraded from 0.5.0 to 0.7.0, and the upgrade scripts to 0.6.0 and 0.7.0 did not have two defaults that are necessary for being able to create a hive table. The columns missing default values are: COLUMNS.INTEGER_IDX SDS.IS_COMPRESSED I set them both to zero(0) (false for IS_COMPRESSED, obviously) The absence of these two default prevents the ability to create a table in Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2058) MySQL Upgrade scripts missing new defaults for two table's columns
MySQL Upgrade scripts missing new defaults for two table's columns -- Key: HIVE-2058 URL: https://issues.apache.org/jira/browse/HIVE-2058 Project: Hive Issue Type: Bug Components: Metastore Reporter: Stephen Tunney Priority: Blocker Upgraded from 0.5.0 to 0.7.0, and the upgrade scripts to 0.6.0 and 0.7.0 did not have two defaults that are necessary for being able to create a hive table. The columns missing default values are: COLUMNS.INTEGER_IDX SDS.IS_COMPRESSED I set them both to zero(0) (false for IS_COMPRESSED, obviously) The absence of these two default prevents the ability to create a table in Hive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2028) Performance instruments for client side execution
[ https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2028: - Attachment: HIVE-2028.2.patch Updated to the latest trunk. Performance instruments for client side execution - Key: HIVE-2028 URL: https://issues.apache.org/jira/browse/HIVE-2028 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2028.2.patch, HIVE-2028.patch Hive client side execution could sometimes takes a long time. This task is to instrument the client side code to measure the time spent in the most likely expensive components. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2011) upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000
[ https://issues.apache.org/jira/browse/HIVE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2011: - Attachment: HIVE-2011.2.patch.txt upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000 Key: HIVE-2011 URL: https://issues.apache.org/jira/browse/HIVE-2011 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Priority: Blocker Fix For: 0.7.0 Attachments: HIVE-2011.1.patch.txt, HIVE-2011.2.patch.txt {code} # mysql flumenewresearch upgrade-0.6.0.mysql.sql ERROR 1071 (42000) at line 16: Specified key was too long; max key length is 767 bytes {code} Here's the cause of the problem from upgrade-0.6.0.mysql.sql: {code} ... ALTER TABLE `COLUMNS` MODIFY `TYPE_NAME` VARCHAR(4000); ... ALTER TABLE `COLUMNS` DROP PRIMARY KEY; ALTER TABLE `COLUMNS` ADD PRIMARY KEY (`SD_ID`, `COLUMN_NAME`); ... {code} We need to make sure that the PK on COLUMNS.TYPE_NAME is dropped before the size of the column is bumped to 4000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2011) upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000
[ https://issues.apache.org/jira/browse/HIVE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2011: - Status: Patch Available (was: Open) Updated the patch with more extensive instructions and official schemas for Hive 0.3.0 through 0.7.0 upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000 Key: HIVE-2011 URL: https://issues.apache.org/jira/browse/HIVE-2011 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Priority: Blocker Fix For: 0.7.0 Attachments: HIVE-2011.1.patch.txt, HIVE-2011.2.patch.txt {code} # mysql flumenewresearch upgrade-0.6.0.mysql.sql ERROR 1071 (42000) at line 16: Specified key was too long; max key length is 767 bytes {code} Here's the cause of the problem from upgrade-0.6.0.mysql.sql: {code} ... ALTER TABLE `COLUMNS` MODIFY `TYPE_NAME` VARCHAR(4000); ... ALTER TABLE `COLUMNS` DROP PRIMARY KEY; ALTER TABLE `COLUMNS` ADD PRIMARY KEY (`SD_ID`, `COLUMN_NAME`); ... {code} We need to make sure that the PK on COLUMNS.TYPE_NAME is dropped before the size of the column is bumped to 4000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2028) Performance instruments for client side execution
[ https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007625#comment-13007625 ] Paul Yang commented on HIVE-2028: - In PerfLogEnd(): {code} sb.append(/); {code} Shouldn't this be a since this is a close tag? Performance instruments for client side execution - Key: HIVE-2028 URL: https://issues.apache.org/jira/browse/HIVE-2028 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2028.2.patch, HIVE-2028.patch Hive client side execution could sometimes takes a long time. This task is to instrument the client side code to measure the time spent in the most likely expensive components. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption
[ https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2059: - Priority: Blocker (was: Major) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption -- Key: HIVE-2059 URL: https://issues.apache.org/jira/browse/HIVE-2059 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Priority: Blocker Fix For: 0.7.0 Attachments: HIVE-2059.1.patch.txt Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which changed some of the defaults for how field names get mapped to datastore identifiers. This was problem was resolved in HIVE-1435 by setting datanucleus.identifierFactory=datanucleus in hive-default.xml However, this property definition was not added to HiveConf. This can result in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 and retains the Hive 0.5.0 version hive-default.xml on their classpath. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption
[ https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2059: - Attachment: HIVE-2059.1.patch.txt Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption -- Key: HIVE-2059 URL: https://issues.apache.org/jira/browse/HIVE-2059 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: HIVE-2059.1.patch.txt Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which changed some of the defaults for how field names get mapped to datastore identifiers. This was problem was resolved in HIVE-1435 by setting datanucleus.identifierFactory=datanucleus in hive-default.xml However, this property definition was not added to HiveConf. This can result in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 and retains the Hive 0.5.0 version hive-default.xml on their classpath. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption
Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption -- Key: HIVE-2059 URL: https://issues.apache.org/jira/browse/HIVE-2059 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: HIVE-2059.1.patch.txt Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which changed some of the defaults for how field names get mapped to datastore identifiers. This was problem was resolved in HIVE-1435 by setting datanucleus.identifierFactory=datanucleus in hive-default.xml However, this property definition was not added to HiveConf. This can result in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 and retains the Hive 0.5.0 version hive-default.xml on their classpath. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption
[ https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2059: - Status: Patch Available (was: Open) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption -- Key: HIVE-2059 URL: https://issues.apache.org/jira/browse/HIVE-2059 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: HIVE-2059.1.patch.txt Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which changed some of the defaults for how field names get mapped to datastore identifiers. This was problem was resolved in HIVE-1435 by setting datanucleus.identifierFactory=datanucleus in hive-default.xml However, this property definition was not added to HiveConf. This can result in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 and retains the Hive 0.5.0 version hive-default.xml on their classpath. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2059: Add datanucleus.identifierFactory property to HiveConf to avoid unintentional MetaStore Schema corruption
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/515/ --- Review request for hive. Summary --- Review request for HIVE-2059. This addresses bug HIVE-2059. https://issues.apache.org/jira/browse/HIVE-2059 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8325870 Diff: https://reviews.apache.org/r/515/diff Testing --- Thanks, Carl
Jenkins build is back to normal : Hive-0.7.0-h0.20 #40
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/40/
[jira] Updated: (HIVE-2028) Performance instruments for client side execution
[ https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2028: - Attachment: HIVE-2028.3.patch Good catch Paul. I'm uploading a new one correcting this. Performance instruments for client side execution - Key: HIVE-2028 URL: https://issues.apache.org/jira/browse/HIVE-2028 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2028.2.patch, HIVE-2028.3.patch, HIVE-2028.patch Hive client side execution could sometimes takes a long time. This task is to instrument the client side code to measure the time spent in the most likely expensive components. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/489/#review333 --- trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/489/#comment674 will do. trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java https://reviews.apache.org/r/489/#comment673 I think explicitly catching HiveException and throw it away will save the creation of HiveException in the catch(Exception) block, right? trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java https://reviews.apache.org/r/489/#comment675 I agree in general we should use a generic interface in the declaration, but I'd prefer leave the LinkedHashMap here. The reason is that what we need for the partSpec is an ordered map where the order of iterator.getNext() should be the same order of elements being inserted. Unfortunately Java collections doesn't have this interface but just an implementation. We could declare an interface just for that, but that should be a different issue. - Ning On 2011-03-11 14:59:46, Ning Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/489/ --- (Updated 2011-03-11 14:59:46) Review request for hive. Summary --- * expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. * only allows {=, AND, OR} in the partition predicates that can be pushed down to JDO filtering. Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1080788 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1080788 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1080788 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1080788 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1080788 Diff: https://reviews.apache.org/r/489/diff Testing --- Thanks, Ning
Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/489/ --- (Updated 2011-03-16 16:50:15.347839) Review request for hive. Changes --- Add more changes to suite the current implementation of JDO filtering. Some major changes are: - ExpressionTree.makeFilterForEquals(): previous '=' are translated to JDO method matches(). This will introduce false positives if the partition value contains regex special characters (e.g., dot). I changed this function to use startWith(), endsWith(), and indexOf() in the case of whether the partition column is at the beginning, end or middle of the partition spec string. Two unit tests files (ppr_pushdown*.q) are added to test these cases. - ObjectStore.listMPartitions(): added a query.setOrder() to return partitions ordered by their partition names. This is to be backward compatible with the old partition pruning behavior. - PartitionPruner.prune(): check if the partition pruning expression contains non-partition columns. If so add the resulting partitions to unkn_parts, otherwise to true_parts. This is required by down stream optimizations. - Utilities.checkJDOPushDown(): return true only if the partition column type is string and constant type is string. This is required by the current implementation of JDO filter (see ExpressionTree.java and Filter.g). Passed all unit tests. Summary --- * expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. * only allows {=, AND, OR} in the partition predicates that can be pushed down to JDO filtering. Diffs (updated) - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1081948 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1081948 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1081948 trunk/ql/src/test/queries/clientpositive/ppr_pushdown.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/ppr_pushdown2.q PRE-CREATION trunk/ql/src/test/results/clientpositive/ppr_pushdown.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/ppr_pushdown2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/489/diff Testing --- Thanks, Ning
[jira] Updated: (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates
[ https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2049: - Attachment: HIVE-2049.3.patch Uploading HIVE-2049.3.patch. This one passed all unit tests. I've also update the review board. Push down partition pruning to JDO filtering for a subset of partition predicates - Key: HIVE-2049 URL: https://issues.apache.org/jira/browse/HIVE-2049 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.patch Several tasks: - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. - figure out a safe subset of partition predicates that can be pushed down to JDO filtering. By my initial testing for the 2nd part is equality queries with AND/OR can be pushed down and return correct results. However range queries on partition columns gave NPE by the JDO execute() function. This might be a bug in the JDO query string itself, but we need to figure it out and heavily test all cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates
[ https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2049: - Status: Patch Available (was: Open) Push down partition pruning to JDO filtering for a subset of partition predicates - Key: HIVE-2049 URL: https://issues.apache.org/jira/browse/HIVE-2049 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.patch Several tasks: - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. - figure out a safe subset of partition predicates that can be pushed down to JDO filtering. By my initial testing for the 2nd part is equality queries with AND/OR can be pushed down and return correct results. However range queries on partition columns gave NPE by the JDO execute() function. This might be a bug in the JDO query string itself, but we need to figure it out and heavily test all cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2028) Performance instruments for client side execution
[ https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007759#comment-13007759 ] Paul Yang commented on HIVE-2028: - +1 Will test and commit. Performance instruments for client side execution - Key: HIVE-2028 URL: https://issues.apache.org/jira/browse/HIVE-2028 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2028.2.patch, HIVE-2028.3.patch, HIVE-2028.patch Hive client side execution could sometimes takes a long time. This task is to instrument the client side code to measure the time spent in the most likely expensive components. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates
On 2011-03-16 16:37:44, Ning Zhang wrote: trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java, line 244 https://reviews.apache.org/r/489/diff/1/?file=13887#file13887line244 I agree in general we should use a generic interface in the declaration, but I'd prefer leave the LinkedHashMap here. The reason is that what we need for the partSpec is an ordered map where the order of iterator.getNext() should be the same order of elements being inserted. Unfortunately Java collections doesn't have this interface but just an implementation. We could declare an interface just for that, but that should be a different issue. Using the generic interface (java.util.Map in this context) in declaration will not affect the expected FIFO functionality. Further, when this map instance is being passed around to get the partition in the getPartition(...) method, the reference type is generic Map and not implementation type. Here is a sample program to illustrate that using Map as reference for an LinkedHashMap maintains the order: public class Sample { public static void main(String[] args) { MapString, String map = new LinkedHashMapString, String(); map.put(A, A); map.put(B, B); map.put(C, C); map.put(D, D); map.put(E, E); map.put(F, F); map.put(G, G); IteratorString iter = map.keySet().iterator(); while (iter.hasNext()) { System.out.println(map.get(iter.next())); } for (Map.EntryString, String entry : map.entrySet()) { System.out.println(entry.getKey() + + entry.getValue()); } } } On 2011-03-16 16:37:44, Ning Zhang wrote: trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java, line 216 https://reviews.apache.org/r/489/diff/1/?file=13887#file13887line216 I think explicitly catching HiveException and throw it away will save the creation of HiveException in the catch(Exception) block, right? Fine. - M --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/489/#review333 --- On 2011-03-16 16:50:15, Ning Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/489/ --- (Updated 2011-03-16 16:50:15) Review request for hive. Summary --- * expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. * only allows {=, AND, OR} in the partition predicates that can be pushed down to JDO filtering. Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1081948 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1081948 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1081948 trunk/ql/src/test/queries/clientpositive/ppr_pushdown.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/ppr_pushdown2.q PRE-CREATION trunk/ql/src/test/results/clientpositive/ppr_pushdown.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/ppr_pushdown2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/489/diff Testing --- Thanks, Ning
Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/489/#review336 --- trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/489/#comment679 Why can't we just do : code return (value instanceof String); /code trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/489/#comment680 Why can't we just do : code return (fs.getType().equals(Constants.STRING_TYPE_NAME)); /code - M On 2011-03-16 16:50:15, Ning Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/489/ --- (Updated 2011-03-16 16:50:15) Review request for hive. Summary --- * expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that PartitionPruner can use that for certain partition predicates. * only allows {=, AND, OR} in the partition predicates that can be pushed down to JDO filtering. Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1081948 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1081948 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1081948 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1081948 trunk/ql/src/test/queries/clientpositive/ppr_pushdown.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/ppr_pushdown2.q PRE-CREATION trunk/ql/src/test/results/clientpositive/ppr_pushdown.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/ppr_pushdown2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/489/diff Testing --- Thanks, Ning
[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified
[ https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007803#comment-13007803 ] Ning Zhang commented on HIVE-2054: -- I think the problem is that the tmpDir by default is /tmp/user.name/, specified by hive.querylog.location. Bennie, I think if you set hive.querylog.location to be something like 'c:\' on Windows, it should work. Can you try that? The JdbcSessionState was introduced in the very first version of the JDBC driver (HIVE-48). It is not used but I guess the reason for it to be able to set session level states for the JDBC connection. I'm OK removing it, but not sure about others' opinion. Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified - Key: HIVE-2054 URL: https://issues.apache.org/jira/browse/HIVE-2054 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.8.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2054.1.patch.txt It seems something recently changed on the jdbc driver which causes this IOException on windows. java.lang.RuntimeException: java.io.IOException: The system cannot find the path specified at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237) at org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73) at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property to HiveConf to avoid unintentional MetaStore Schema corruption
[ https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-2059: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to branch and trunk. Thanks Carl! Add datanucleus.identifierFactory property to HiveConf to avoid unintentional MetaStore Schema corruption - Key: HIVE-2059 URL: https://issues.apache.org/jira/browse/HIVE-2059 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Priority: Blocker Fix For: 0.7.0 Attachments: HIVE-2059.1.patch.txt Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which changed some of the defaults for how field names get mapped to datastore identifiers. This was problem was resolved in HIVE-1435 by setting datanucleus.identifierFactory=datanucleus in hive-default.xml However, this property definition was not added to HiveConf. This can result in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 and retains the Hive 0.5.0 version hive-default.xml on their classpath. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1959) Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection.
[ https://issues.apache.org/jira/browse/HIVE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007828#comment-13007828 ] Ning Zhang commented on HIVE-1959: -- Sorry Chinna, I must have missed your last comment. Yeah, I think it will work with the current patch. I will start testing and commit if it passes. Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection. --- Key: HIVE-1959 URL: https://issues.apache.org/jira/browse/HIVE-1959 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.8.0 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-1959.patch *org.apache.hadoop.hive.ql.history.HiveHistory$TaskInfo* and *org.apache.hadoop.hive.ql.history.HiveHistory$QueryInfo* these two objects are getting accumulated on executing more number of queries on the same connection. These objects are getting released only when the connection is closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira