[jira] [Created] (HIVE-12865) Exchange partition does not show inputs field for post/pre execute hooks
Paul Yang created HIVE-12865: Summary: Exchange partition does not show inputs field for post/pre execute hooks Key: HIVE-12865 URL: https://issues.apache.org/jira/browse/HIVE-12865 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Paul Yang The pre/post execute hook interface has fields that indicate which Hive objects were read / written to as a result of running the query. For the exchange partition operation, the read entity field is empty. This is an important issue as the hook interface may be configured to perform critical warehouse operations. See ql/src/test/results/clientpositive/exchange_partition3.q.out {code} --- a/ql/src/test/results/clientpositive/exchange_partition3.q.out +++ b/ql/src/test/results/clientpositive/exchange_partition3.q.out @@ -65,9 +65,17 @@ ds=2013-04-05/hr=2 PREHOOK: query: -- This will exchange both partitions hr=1 and hr=2 ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH TABLE exchange_part_test2 PREHOOK: type: ALTERTABLE_EXCHANGEPARTITION +PREHOOK: Output: default@exchange_part_test1 +PREHOOK: Output: default@exchange_part_test2 POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2 ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH TABLE exchange_part_test2 POSTHOOK: type: ALTERTABLE_EXCHANGEPARTITION +POSTHOOK: Output: default@exchange_part_test1 +POSTHOOK: Output: default@exchange_part_test1@ds=2013-04-05/hr=1 +POSTHOOK: Output: default@exchange_part_test1@ds=2013-04-05/hr=2 +POSTHOOK: Output: default@exchange_part_test2 +POSTHOOK: Output: default@exchange_part_test2@ds=2013-04-05/hr=1 +POSTHOOK: Output: default@exchange_part_test2@ds=2013-04-05/hr=2 PREHOOK: query: SHOW PARTITIONS exchange_part_test1 PREHOOK: type: SHOWPARTITIONS PREHOOK: Input: default@exchange_part_test1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11554) Exchange partition outputs missing from post execute hooks
Paul Yang created HIVE-11554: Summary: Exchange partition outputs missing from post execute hooks Key: HIVE-11554 URL: https://issues.apache.org/jira/browse/HIVE-11554 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0, 1.0.0, 0.14.0, 0.13.0, 0.12.0 Reporter: Paul Yang The pre/post execute hook interface has fields that indicate which Hive objects were read / written to as a result of running the query. For the exchange partition operation, these fields (ReadEntity and WriteEntity) are empty. This is an important issue as the hook interface may be configured to perform critical warehouse operations. See {noformat} ql/src/test/results/clientpositive/exchange_partition3.q.out {noformat} {noformat} POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2 ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH TABLE exchange_part_test2 POSTHOOK: type: null {noformat} The post hook should not say null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3042) thrift jars do not need to be passed to the mappers and reducers
[ https://issues.apache.org/jira/browse/HIVE-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-3042: Attachment: HIVE-3042.1.patch thrift jars do not need to be passed to the mappers and reducers Key: HIVE-3042 URL: https://issues.apache.org/jira/browse/HIVE-3042 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.10.0 Attachments: HIVE-3042.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3042) Thrift classes do not need to be passed to the mappers and reducers
[ https://issues.apache.org/jira/browse/HIVE-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-3042: Summary: Thrift classes do not need to be passed to the mappers and reducers (was: thrift jars do not need to be passed to the mappers and reducers) Thrift classes do not need to be passed to the mappers and reducers --- Key: HIVE-3042 URL: https://issues.apache.org/jira/browse/HIVE-3042 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.10.0 Attachments: HIVE-3042.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3042) Thrift classes do not need to be passed to the mappers and reducers
[ https://issues.apache.org/jira/browse/HIVE-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-3042: Fix Version/s: 0.10.0 Affects Version/s: 0.10.0 Status: Patch Available (was: Open) Thrift classes do not need to be passed to the mappers and reducers --- Key: HIVE-3042 URL: https://issues.apache.org/jira/browse/HIVE-3042 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.10.0 Attachments: HIVE-3042.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3042) Thrift classes do not need to be passed to the mappers and reducers
[ https://issues.apache.org/jira/browse/HIVE-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-3042: Resolution: Duplicate Status: Resolved (was: Patch Available) Thrift classes do not need to be passed to the mappers and reducers --- Key: HIVE-3042 URL: https://issues.apache.org/jira/browse/HIVE-3042 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.10.0 Attachments: HIVE-3042.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3042) Thrift classes do not need to be passed to the mappers and reducers
[ https://issues.apache.org/jira/browse/HIVE-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280584#comment-13280584 ] Paul Yang commented on HIVE-3042: - Duplicate of HIVE-3040 Thrift classes do not need to be passed to the mappers and reducers --- Key: HIVE-3042 URL: https://issues.apache.org/jira/browse/HIVE-3042 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Namit Jain Assignee: Paul Yang Fix For: 0.10.0 Attachments: HIVE-3042.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3000) Potential infinite loop / log spew in ZookeeperHiveLockManager
Paul Yang created HIVE-3000: --- Summary: Potential infinite loop / log spew in ZookeeperHiveLockManager Key: HIVE-3000 URL: https://issues.apache.org/jira/browse/HIVE-3000 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.9.0 Reporter: Paul Yang See ZookeeperHiveLockManger.lock() If Zookeeper is in a bad state, it's possible to get an exception (e.g. org.apache.zookeeper.KeeperException$SessionExpiredException) when we call lockPrimitive(). There is a bug in the exception handler where the loop does not exit because the break in the switch statement gets out the switch, not the do..while loop. Because tryNum was not incremented due to the exception, lockPrimitive() will be called in an infinite loop, as fast as possible. Since the exception is printed for each call, Hive will produce significant log spew. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2942) substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2942: Resolution: Fixed Fix Version/s: 0.10 Status: Resolved (was: Patch Available) Whoops, forgot to mention that I was running the tests too - sorry Namit. They passed, and I committed. Thanks Kevin! substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException - Key: HIVE-2942 URL: https://issues.apache.org/jira/browse/HIVE-2942 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong Fix For: 0.10 Attachments: HIVE-2942.D2727.1.patch After HIVE-2792, the substr function produces a StringIndexOutOfBoundsException when called on a string containing UTF-8 characters without the length argument being present. E.g. select substr(str, 1) from table1; now fails with that exception if str contains a UTF-8 character for any row in the table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2556) upgrade script 008-HIVE-2246.mysql.sql contains syntax errors
[ https://issues.apache.org/jira/browse/HIVE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146715#comment-13146715 ] Paul Yang commented on HIVE-2556: - +1 Will commit. upgrade script 008-HIVE-2246.mysql.sql contains syntax errors - Key: HIVE-2556 URL: https://issues.apache.org/jira/browse/HIVE-2556 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0, 0.9.0 Attachments: D309.1.patch, HIVE-2556.patch source script_name gives syntax errors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2556) upgrade script 008-HIVE-2246.mysql.sql contains syntax errors
[ https://issues.apache.org/jira/browse/HIVE-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2556: Resolution: Fixed Release Note: Committed to trunk and branch 0.8.0 Thanks Ning! Status: Resolved (was: Patch Available) upgrade script 008-HIVE-2246.mysql.sql contains syntax errors - Key: HIVE-2556 URL: https://issues.apache.org/jira/browse/HIVE-2556 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0, 0.9.0 Attachments: D309.1.patch, HIVE-2556.patch source script_name gives syntax errors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2366) Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table
[ https://issues.apache.org/jira/browse/HIVE-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143630#comment-13143630 ] Paul Yang commented on HIVE-2366: - I regenerated the patch based on trunk. Looks good to me as well. Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table --- Key: HIVE-2366 URL: https://issues.apache.org/jira/browse/HIVE-2366 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2366.1.patch, HIVE-2366.2.patch The upgrade scripts for the hive metastore in HIVE-2246 do not upgrade the indexes. They also need to rename the old COLUMNS table after migration so that old clients will not accidentally access the COLUMNS table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors
[ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138065#comment-13138065 ] Paul Yang commented on HIVE-2368: - Backporting this to branch-0.8 as well. Slow dropping of partitions caused by full listing of storage descriptors - Key: HIVE-2368 URL: https://issues.apache.org/jira/browse/HIVE-2368 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.9.0 Attachments: HIVE-2368.1.patch To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs. This can severely slow down dropping partitions. We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: Apache Hive 0.8.0 Release Candidate 0
Doing it now... Updated task at: https://issues.apache.org/jira/browse/HIVE-2368 -Original Message- From: John Sichi [mailto:jsi...@fb.com] Sent: Thursday, October 27, 2011 5:07 PM To: dev@hive.apache.org Cc: dev@hive.apache.org; Carl Steinbach Subject: Re: Apache Hive 0.8.0 Release Candidate 0 Sure, do you want to go ahead and backport it? JVS On Oct 27, 2011, at 4:01 PM, Paul Yang py...@fb.com wrote: Can we include HIVE-2368 in the RC? Otherwise, drop performance will be too slow for tables with lots of partitions. -Original Message- From: John Sichi [mailto:jsi...@fb.com] Sent: Thursday, October 27, 2011 2:03 PM To: dev@hive.apache.org; Carl Steinbach Subject: Re: Apache Hive 0.8.0 Release Candidate 0 I did the following testing on this RC. Getting Started: basic sanity testing with pokes table Source: build and package (did not run ant test yet, will do so on next RC) PDK: built and ran examples/test-plugin Upgrade: tested script for MySQL upgrade from 0.7.1 (starting with fresh 0.7.0 schema creation script) to 0.8 using partitioned invites table I logged HIVE-2529 for the missing PostgreSQL metastore upgrade script. RELEASE_NOTES.txt has regressed to the 0.6 content (0.7.1 has the correct content). For README.txt, maybe we should use the new wording from the wiki for the first line since it's now the official project trademark description? And throw in a bunch of (TM)'s for good measure, since we love them so much? Looking forward to RC1! JVS On Oct 21, 2011, at 2:51 PM, Carl Steinbach wrote: Apache Hive 0.8.0 Release Candidate 0 is available here: http://people.apache.org/~cws/hive-0.8.0-candidate-0/ This RC does not include patches for the following tickets which are currently marked as blockers: * HIVE-2391. Published POMs in Maven repo are incorrect * HIVE-2521. Update wiki links in README We plan to include patches for both of these tickets in RC1, but would appreciate it if folks would look at this RC in an attempt to find any other blockers. Thanks! Carl
[jira] [Updated] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors
[ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2368: Summary: Slow dropping of partitions caused by full listing of storage descriptors (was: Determining whether a Column Descriptor is unused may take too long) Slow dropping of partitions caused by full listing of storage descriptors - Key: HIVE-2368 URL: https://issues.apache.org/jira/browse/HIVE-2368 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Attachments: HIVE-2368.1.patch To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs. This can severely slow down dropping partitions. We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors
[ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135387#comment-13135387 ] Paul Yang commented on HIVE-2368: - Committed. Thanks Sohan! Slow dropping of partitions caused by full listing of storage descriptors - Key: HIVE-2368 URL: https://issues.apache.org/jira/browse/HIVE-2368 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.9.0 Attachments: HIVE-2368.1.patch To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs. This can severely slow down dropping partitions. We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2368) Slow dropping of partitions caused by full listing of storage descriptors
[ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang resolved HIVE-2368. - Resolution: Fixed Fix Version/s: 0.9.0 Slow dropping of partitions caused by full listing of storage descriptors - Key: HIVE-2368 URL: https://issues.apache.org/jira/browse/HIVE-2368 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.9.0 Attachments: HIVE-2368.1.patch To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs. This can severely slow down dropping partitions. We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2502) Add whitelist for hosts used in table/partition locations
[ https://issues.apache.org/jira/browse/HIVE-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132180#comment-13132180 ] Paul Yang commented on HIVE-2502: - Got failures in a few tests: alter_view_rename.q create_or_replace_view.q create_view.q recursive_view.q Seems like the views case isn't properly handled. Add whitelist for hosts used in table/partition locations - Key: HIVE-2502 URL: https://issues.apache.org/jira/browse/HIVE-2502 Project: Hive Issue Type: Improvement Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2502.1.patch.txt, HIVE-2502.2.patch.txt, HIVE-2502.3.patch.txt Add a whitelist of host names that can be checked before creating/altering a table/partition to verify that the location is acceptable. The whitelist should be empty by default, and should be configurable. The check should default to pass if there is no host in the location, or the whitelist is empty. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long
[ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132215#comment-13132215 ] Paul Yang commented on HIVE-2368: - The patch was good, will test and commit. Determining whether a Column Descriptor is unused may take too long --- Key: HIVE-2368 URL: https://issues.apache.org/jira/browse/HIVE-2368 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Attachments: HIVE-2368.1.patch To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs. This can severely slow down dropping partitions. We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2502) Add whitelist for hosts used in table/partition locations
[ https://issues.apache.org/jira/browse/HIVE-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127844#comment-13127844 ] Paul Yang commented on HIVE-2502: - +1 Will test and commit. Add whitelist for hosts used in table/partition locations - Key: HIVE-2502 URL: https://issues.apache.org/jira/browse/HIVE-2502 Project: Hive Issue Type: Improvement Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2502.1.patch.txt, HIVE-2502.2.patch.txt, HIVE-2502.3.patch.txt Add a whitelist of host names that can be checked before creating/altering a table/partition to verify that the location is acceptable. The whitelist should be empty by default, and should be configurable. The check should default to pass if there is no host in the location, or the whitelist is empty. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083751#comment-13083751 ] Paul Yang commented on HIVE-2246: - There has been some issues identified with this patch. We will be doing some additional testing, but we might rollback so that we don't leave trunk in an unstable state. Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, HIVE-2246.8.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083854#comment-13083854 ] Paul Yang commented on HIVE-2322: - +1. Tested and will commit. Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, HIVE-2322.4.patch, HIVE-2322.5.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083865#comment-13083865 ] Paul Yang commented on HIVE-2322: - Committed. Thanks Sohan! Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, HIVE-2322.4.patch, HIVE-2322.5.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081956#comment-13081956 ] Paul Yang commented on HIVE-2246: - +1 - tests passed. Will commit. Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, HIVE-2246.8.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang resolved HIVE-2246. - Resolution: Fixed Fix Version/s: 0.8.0 Release Note: This makes an incompatible change in the metastore DB table schema from previous versions (0.8). Older metastores created with previous versions of Hive will need to be upgraded with the supplied scripts. Committed. Thanks Sohan! Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, HIVE-2246.8.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2319) Calling alter_table after changing partition comment throws an exception
[ https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2319: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed. Thanks Sohan! Calling alter_table after changing partition comment throws an exception Key: HIVE-2319 URL: https://issues.apache.org/jira/browse/HIVE-2319 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2319.2.patch, HIVE-2319.3.patch, HIVE-2319.4.patch Altering a table's partition key comments raises an InvalidOperationException. The partition key name and type should not be mutable, but the comment should be able to get changed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2246: Dedupe tables' column schemas from partitions in the metastore db
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/#review1309 --- Also, can you add migration scripts for other DB's? trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql https://reviews.apache.org/r/1183/#comment2982 Typo trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/1183/#comment2979 The check and the delete should in the same transaction, as it's possible for a reference to a CD to be created after the check but before the delete. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/1183/#comment2981 How does this drop the storage descriptor? trunk/metastore/src/model/package.jdo https://reviews.apache.org/r/1183/#comment2968 Fix indent - Paul On 2011-08-05 20:49:19, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/ --- (Updated 2011-08-05 20:49:19) Review request for hive, Ning Zhang and Paul Yang. Summary --- This patch tries to make minimal changes to the API while keeping migration short and somewhat easy to revert. The new schema can be described as follows: - CDS is a table corresponding to Column Descriptor objects. Currently, it only stores a CD_ID. - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to the CD_ID to which it belongs. - SDS was modified to reference a Column Descriptor. So SDS now has a foreign key to a CD_ID which describes its columns. During migration, we create Column Descriptors for tables in a straightforward manner: their columns are now just wrapped inside a column descriptor. The SDS of partitions use their parent table's column descriptor, since currently a partition and its table share the same list of columns. When altering or adding a partition, give it it's parent table's column descriptor IF the columns they describe are the same. Otherwise, create a new column descriptor for its columns. When adding or altering a table, create a new column descriptor every time. Whenever you drop a storage descriptor (e.g, when dropping tables or partitions), check to see if the related column descriptor has any other references in the table. That is, check to see if any other storage descriptors point to that column descriptor. If none do, then delete that column descriptor. This check is in place so we don't have unreferenced column descriptors and columns hanging around after schema evolution for tables. This addresses bug HIVE-2246. https://issues.apache.org/jira/browse/HIVE-2246 Diffs - trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1153927 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1153927 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java 1153927 trunk/metastore/src/model/package.jdo 1153927 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153927 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java 1153927 Diff: https://reviews.apache.org/r/1183/diff Testing --- Passes facebook's regression testing and all existing test cases. In one instance, before migration, the overhead involved with storage descriptors and columns was ~11 GB. After migration, the overhead was ~1.5 GB. Thanks, Sohan
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079500#comment-13079500 ] Paul Yang commented on HIVE-2322: - Can you regenerate this patch? I'm getting some patch failures. Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2319) Calling alter_table after changing partition comment throws an exception
[ https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079505#comment-13079505 ] Paul Yang commented on HIVE-2319: - +1 Will test and commit Calling alter_table after changing partition comment throws an exception Key: HIVE-2319 URL: https://issues.apache.org/jira/browse/HIVE-2319 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2319.2.patch, HIVE-2319.3.patch, HIVE-2319.4.patch Altering a table's partition key comments raises an InvalidOperationException. The partition key name and type should not be mutable, but the comment should be able to get changed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Support archiving for multiple partitions if the table is partitioned by multiple columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1259/#review1278 --- trunk/data/conf/hive-site.xml https://reviews.apache.org/r/1259/#comment2921 Why is a hook needed to update this count? trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/1259/#comment2924 One possible issue is that if the user changes the value of this through the CLI (i.e. with a set xxx=yyy;), it wouldn't take effect. It should be read in the constructor or in the methods. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/1259/#comment2923 Should be info or debug, not error trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/1259/#comment2927 Why are we saving unboundKey? Can we just break on the first instance when partSpec does not contain fs.getName()? trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/1259/#comment2928 Typo trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/1259/#comment2929 Change to something like 'Cannot drop a subset of partitions in an archive' trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/1259/#comment2930 This is going to be feature that we'll need to support. The script that drops based on retention can't unarchive as it will take too long. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MetaUtils.java https://reviews.apache.org/r/1259/#comment2931 Need more context for these log messages - i.e. what kind of partitions were found? trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MetaUtils.java https://reviews.apache.org/r/1259/#comment2934 The max parameter for getPartitions is not yet implemented. It needs to be or otherwise this will hang for large tables. trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateArchivedCounter.java https://reviews.apache.org/r/1259/#comment2919 Comment from a different hook? trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateArchivedCounter.java https://reviews.apache.org/r/1259/#comment2920 Formatting - add spaces trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/1259/#comment2932 More context trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/1259/#comment2933 Remove commented code - Paul On 2011-08-02 21:34:07, Marcin Kurczych wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1259/ --- (Updated 2011-08-02 21:34:07) Review request for hive, Paul Yang and namit jain. Summary --- Allowing archiving at chosen level. When table is partitioned by ds, hr, min it allows archiving at ds level, hr level and min level. Corresponding syntaxes are: ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08'); ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11'); ALTER TABLE test ARCHIVE PARTITION (ds='2008-04-08', hr='11', min='30'); You cannot do much to archived partitions. You can read them. You cannot write to them / overwrite them. You can drop single archived partitions, but not parts of bigger archives. Diffs - trunk/data/conf/hive-site.xml 1153271 trunk/metastore/if/hive_metastore.thrift 1153271 trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h 1153271 trunk/metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp 1153271 trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Constants.java 1153271 trunk/metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php 1153271 trunk/metastore/src/gen/thrift/gen-py/hive_metastore/constants.py 1153271 trunk/metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb 1153271 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MetaUtils.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateArchivedCounter.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/DummyPartition.java 1153271 trunk/ql/src/java/org/apache/hadoop/hive/ql
Re: Review Request: HIVE-2319: Calling alter_table after changing partition key comment throws an exception
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1213/#review1215 --- trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java https://reviews.apache.org/r/1213/#comment2712 Actually, we can't allow for a different ordering because that would imply a different directory structure. We should just make sure that everything but the comments are equal. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java https://reviews.apache.org/r/1213/#comment2713 Message should say - it was able to change when it shouldn't have? - Paul On 2011-07-28 07:06:24, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1213/ --- (Updated 2011-07-28 07:06:24) Review request for hive and Paul Yang. Summary --- Altering a table's partition key comments raises an InvalidOperationException. The partition key name and type should not be mutable, but the comment should be able to get changed. This addresses bug HIVE-2319. https://issues.apache.org/jira/browse/HIVE-2319 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1151219 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1151219 Diff: https://reviews.apache.org/r/1213/diff Testing --- Added some test cases to HiveMetaStore that pass. Thanks, Sohan
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072565#comment-13072565 ] Paul Yang commented on HIVE-2322: - +1 Will test and commit Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071291#comment-13071291 ] Paul Yang commented on HIVE-2226: - Committed. Thanks Sohan! Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2226: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2309: Attachment: HIVE-2309.1.patch Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2309: Attachment: HIVE-2309.2.patch Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0​-9])?(\\..*)?$, 'attempt_201107090429_6496​5_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070875#comment-13070875 ] Paul Yang commented on HIVE-2226: - +1 Will test and commit Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2301) Throw error when attempting to create a column with the same name as a partition column
Throw error when attempting to create a column with the same name as a partition column --- Key: HIVE-2301 URL: https://issues.apache.org/jira/browse/HIVE-2301 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.8.0 Reporter: Paul Yang Priority: Minor If an alter table is run to rename a column to the same name as a partition column, the alter will succeed. However, subsequent operations on that table will fail. {code} hive create table tmp_pyang_test (key string) partitioned by (ds string); OK Time taken: 4.773 seconds hive alter table tmp_pyang_test replace columns (ds string); OK Time taken: 1.254 seconds hive describe tmp_pyang_test; FAILED: Error in metadata: Partition column name ds conflicts with table columns. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2224) Ability to add partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2224: Summary: Ability to add partitions atomically (was: Ability to add_partitions, and atomically) Ability to add partitions atomically Key: HIVE-2224 URL: https://issues.apache.org/jira/browse/HIVE-2224 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-2224.patch I'd like to see an atomic version of the add_partitions() call. Whether this is to be done by config to affect add_partitions() behaviour (not my preference) or just changing add_partitions() default behaviour (my preference, but likely to affect current behaviour, so will need others' input) or by making a new add_partitions_atomic() call depends on discussion. This looks relatively doable to implement (will need a dependent add_partition_core to not do a ms.commit_partition() early, and to cache list of directories created to remove on rollback, and a list of AddPartitionEvent to trigger in one shot later) Thoughts? This also seems like something to implement for allowing HIVE-1805. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2224) Ability to add partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068536#comment-13068536 ] Paul Yang commented on HIVE-2224: - Seems like it was an issue with the machine. But it has been committed - thanks Sushanth! Ability to add partitions atomically Key: HIVE-2224 URL: https://issues.apache.org/jira/browse/HIVE-2224 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-2224.patch I'd like to see an atomic version of the add_partitions() call. Whether this is to be done by config to affect add_partitions() behaviour (not my preference) or just changing add_partitions() default behaviour (my preference, but likely to affect current behaviour, so will need others' input) or by making a new add_partitions_atomic() call depends on discussion. This looks relatively doable to implement (will need a dependent add_partition_core to not do a ms.commit_partition() early, and to cache list of directories created to remove on rollback, and a list of AddPartitionEvent to trigger in one shot later) Thoughts? This also seems like something to implement for allowing HIVE-1805. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2224) Ability to add partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2224: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Ability to add partitions atomically Key: HIVE-2224 URL: https://issues.apache.org/jira/browse/HIVE-2224 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 0.8.0 Attachments: HIVE-2224.patch I'd like to see an atomic version of the add_partitions() call. Whether this is to be done by config to affect add_partitions() behaviour (not my preference) or just changing add_partitions() default behaviour (my preference, but likely to affect current behaviour, so will need others' input) or by making a new add_partitions_atomic() call depends on discussion. This looks relatively doable to implement (will need a dependent add_partition_core to not do a ms.commit_partition() early, and to cache list of directories created to remove on rollback, and a list of AddPartitionEvent to trigger in one shot later) Thoughts? This also seems like something to implement for allowing HIVE-1805. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2224) Ability to add_partitions, and atomically
[ https://issues.apache.org/jira/browse/HIVE-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067209#comment-13067209 ] Paul Yang commented on HIVE-2224: - Sorry for the delay, but I've running into some test issues that are likely not caused by your patch. Ability to add_partitions, and atomically - Key: HIVE-2224 URL: https://issues.apache.org/jira/browse/HIVE-2224 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-2224.patch I'd like to see an atomic version of the add_partitions() call. Whether this is to be done by config to affect add_partitions() behaviour (not my preference) or just changing add_partitions() default behaviour (my preference, but likely to affect current behaviour, so will need others' input) or by making a new add_partitions_atomic() call depends on discussion. This looks relatively doable to implement (will need a dependent add_partition_core to not do a ms.commit_partition() early, and to cache list of directories created to remove on rollback, and a list of AddPartitionEvent to trigger in one shot later) Thoughts? This also seems like something to implement for allowing HIVE-1805. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2224 : Exposing add_partitions() from hive metastore, making it atomic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/999/#review1068 --- http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/999/#comment2160 This part is a little unusual, as Entry objects are mostly used during iteration - Paul On 2011-07-07 23:20:22, Sushanth Sowmyan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/999/ --- (Updated 2011-07-07 23:20:22) Review request for hive. Summary --- As per HIVE-2224 ( https://issues.apache.org/jira/browse/HIVE-2224 ), this patch does the following: + Exposing add_partitions() from the thrift metastore api + Making add_partitions() atomic This addresses bug HIVE-2224. https://issues.apache.org/jira/browse/HIVE-2224 Diffs - http://svn.apache.org/repos/asf/hive/trunk/metastore/if/hive_metastore.thrift 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1142116 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1142116 Diff: https://reviews.apache.org/r/999/diff Testing --- Modified TestHiveMetaStore.partitionTester() to add tests for the following scenarios: + add_partition(empty list) : no exceptions thrown : works + add_partitions(list containing 3 partitions) : works, verified that partitions exist + add_partitions(list containing 2 partitions, where one of them has keyvalues identical to original partition, and another partition has a directory already created, verified that the call causes an exception, and directories are unchanged(the one with the dir pre-existing continues to exist), and no additional partition was published. + add_partitions(list with a single partition) : works, does not fault on duplicate as a result of a partial publish above. + verified that all above created partitions exist : works All of the above is called from both TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore Thanks, Sushanth
[jira] [Commented] (HIVE-2224) Ability to add_partitions, and atomically
[ https://issues.apache.org/jira/browse/HIVE-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065625#comment-13065625 ] Paul Yang commented on HIVE-2224: - +1 Will test and commit Ability to add_partitions, and atomically - Key: HIVE-2224 URL: https://issues.apache.org/jira/browse/HIVE-2224 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-2224.patch I'd like to see an atomic version of the add_partitions() call. Whether this is to be done by config to affect add_partitions() behaviour (not my preference) or just changing add_partitions() default behaviour (my preference, but likely to affect current behaviour, so will need others' input) or by making a new add_partitions_atomic() call depends on discussion. This looks relatively doable to implement (will need a dependent add_partition_core to not do a ms.commit_partition() early, and to cache list of directories created to remove on rollback, and a list of AddPartitionEvent to trigger in one shot later) Thoughts? This also seems like something to implement for allowing HIVE-1805. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners
[ https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang resolved HIVE-2194. - Resolution: Fixed Fix Version/s: 0.8.0 Committed. Thanks Sohan! Add actions for alter table and alter partition events for metastore event listeners Key: HIVE-2194 URL: https://issues.apache.org/jira/browse/HIVE-2194 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2194.1.patch, HIVE-2194.3.patch HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062179#comment-13062179 ] Paul Yang commented on HIVE-2219: - I likely mixed up the RB and JIRA versions - looking at HIVE-2275 now. Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners
[ https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062239#comment-13062239 ] Paul Yang commented on HIVE-2194: - Sohan just mentioned that there was a mismatch between the RB and JIRA versions for this one too. This will require another patch. Add actions for alter table and alter partition events for metastore event listeners Key: HIVE-2194 URL: https://issues.apache.org/jira/browse/HIVE-2194 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2194.1.patch, HIVE-2194.3.patch, HIVE-2194.4.patch HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners
[ https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060720#comment-13060720 ] Paul Yang commented on HIVE-2194: - +1 Will test and commit. Add actions for alter table and alter partition events for metastore event listeners Key: HIVE-2194 URL: https://issues.apache.org/jira/browse/HIVE-2194 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2194.1.patch, HIVE-2194.3.patch HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2226: Add API to metastore for table filtering based on table properties
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/#review928 --- trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/910/#comment2014 Using the form hive_filter_field_params__parameter key seems a little odd. Can't think of an easy way to handle this case though, so it should probably be okay. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/910/#comment2003 I don't think it's possible to create 2 tables with the same name. In which case, there shouldn't be a need for this check. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java https://reviews.apache.org/r/910/#comment2005 We should catch the case where the keyName is invalid - Paul On 2011-06-20 21:04:45, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/ --- (Updated 2011-06-20 21:04:45) Review request for hive and Paul Yang. Summary --- Create a function listTableNamesByFilter that returns a list of names for tables in a database that match a certain filter. The syntax of the filter is similar to the one created by HIVE-1609. You can filter the table list based on owner, last access time, or table parameter key/values. The filtering takes place at the JDO level for efficiency/speed. To create a new kind of table filter, add a constant to thrift.if and a branch in the if statement in generateJDOFilterOverTables() in ExpressionTree. Example filter statements include: //translation: owner.matches(.*test.*) and lastAccessTime == 0 filter = Constants.HIVE_FILTER_FIELD_OWNER + like \.*test.*\ and + Constants.HIVE_FILTER_FIELD_LAST_ACCESS + = 0; //translation: owner = test_user and (parameters.get(retention) == 30 || parameters.get(retention) == 90) filter = Constants.HIVE_FILTER_FIELD_OWNER + = \test_user\ and ( + Constants.HIVE_FILTER_FIELD_PARAMS + retention = \30\ or + Constants.HIVE_FILTER_FIELD_PARAMS + retention = \90\) The filter can currently parse string or integer values, where values interpreted as strings must be in quotes. See the comments in IMetaStoreClient for more usage details/restrictions. This addresses bug HIVE-2226. https://issues.apache.org/jira/browse/HIVE-2226 Diffs - trunk/metastore/if/hive_metastore.thrift 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1136751 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 1136751 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751 Diff: https://reviews.apache.org/r/910/diff Testing --- Added test cases to TestHiveMetaStore Thanks, Sohan
Re: Review Request: HIVE-2194: Add actions to MetaStoreEventListener to be performed on alter table and alter partition operations.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/853/#review920 --- trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java https://reviews.apache.org/r/853/#comment1983 Can we check for equality between origP and the old partition instead of comparing fields? trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java https://reviews.apache.org/r/853/#comment1982 Same here trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java https://reviews.apache.org/r/853/#comment1981 origCols.equals(oldCols)? - Paul On 2011-06-21 20:25:04, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/853/ --- (Updated 2011-06-21 20:25:04) Review request for hive and Paul Yang. Summary --- HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. Also, update the hive metastore to call the appropriate functions of the listeners when the events happen. This addresses bug HIVE-2194. https://issues.apache.org/jira/browse/HIVE-2194 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1138144 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1138144 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 1138144 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1138144 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.java PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/events/AlterTableEvent.java PRE-CREATION trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java 1138144 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 1138144 Diff: https://reviews.apache.org/r/853/diff Testing --- Added test cases to TestMetaStoreEventListener. Thanks, Sohan
[jira] [Commented] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056180#comment-13056180 ] Paul Yang commented on HIVE-2219: - +1 Will test and commit. Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2219.1.patch The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize partial specification metastore functions
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2213: Summary: Optimize partial specification metastore functions (was: Optimize get_partition_names_ps()) Optimize partial specification metastore functions -- Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize partial specification metastore functions
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2213: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed. Thanks Sohan! Optimize partial specification metastore functions -- Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051419#comment-13051419 ] Paul Yang commented on HIVE-2213: - If get_partitions_ps_with_auth() was not correct before, then we should fix the method to produce the correct behavior. Ideally, it should have been done in a separate JIRA, but it should be okay to include in this one. +1 looks good though, will test and commit. Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review853 --- trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1862 Line exceeds 100 char limit - Paul On 2011-06-13 21:11:38, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-13 21:11:38) Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050586#comment-13050586 ] Paul Yang commented on HIVE-2213: - Looks good, but can you do a minor update to fix lines longer than 100 chars? Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: [VOTE] Hive 0.7.1 Release Candidate 0
+1 -Original Message- From: Carl Steinbach [mailto:c...@cloudera.com] Sent: Thursday, June 16, 2011 1:22 PM To: dev@hive.apache.org Subject: Re: [VOTE] Hive 0.7.1 Release Candidate 0 +1 from me too. We need one more +1 vote in order to release. On Thu, Jun 16, 2011 at 11:31 AM, John Sichi jsi...@fb.com wrote: +1 from me. I downloaded and verified build+run. I did not verify +tests or upgrades. JVS On Jun 15, 2011, at 12:52 AM, Carl Steinbach wrote: Apache Hive 0.7.1 Release Candidate 0 is available here: http://people.apache.org/~cws/hive-0.7.1-candidate-0/ We need three +1 votes from Hive PMC members in order to release. Please vote. Thanks. Carl
Re: Review Request: HIVE-2261: Add API to metastore for table filtering based on table properties
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/#review855 --- trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/910/#comment1868 Can we rename this to TableQueryFilterType so that it's clear that it's only used for tables? trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/910/#comment1864 Where is this used? trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/910/#comment1866 Hive doesn't really use the retention field. Can you remove operations on this field from the rest of the diff? trunk/metastore/if/hive_metastore.thrift https://reviews.apache.org/r/910/#comment1869 The interface is a little odd because we have to use names like 'owner' or 'retention' in addition to specifying the QueryFilterType. Maybe we should make the field that the QueryFilterType references be called 'field', so you'd have a filter like 'field = .*test_user.*' (for owner) or 'field 90' (for retention) trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/910/#comment1867 Style issue, { should be on same line as if trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java https://reviews.apache.org/r/910/#comment1870 JDO-174 looks like it was fixed a while back - is this still an issue? and may be useful operators for the the parameters field. (e.g. if retention were stored there instead of the member field) - Paul On 2011-06-16 03:13:24, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/910/ --- (Updated 2011-06-16 03:13:24) Review request for hive and Paul Yang. Summary --- Create a function listTableNamesByFilter that returns a list of names for tables in a database that match a certain filter. The syntax of the filter is similar to the one created by HIVE-1609. You can filter the table list based on owner, retention, or table parameter key/values. The filtering takes place at the JDO level for efficiency/speed. Added a QueryFilterType enum to easily add new filters and separate logic for filtering. Example filter statements include: filterType = QueryFilterType.OWNER; filter = owner like .*test_user.* filterType = QueryFilterType.RETENTION; filter = retention 90 and retention 30 filterType = QueryFilterType.PARAMS; filter = numPartitions = \2\ and retention_days = \30\ The filter can currently parse string or integer values, where values interpreted as strings must be in quotes. This addresses bug HIVE-2226. https://issues.apache.org/jira/browse/HIVE-2226 Diffs - trunk/metastore/if/hive_metastore.thrift 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 1135227 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 Diff: https://reviews.apache.org/r/910/diff Testing --- Added test cases to TestHiveMetaStore Thanks, Sohan
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review858 --- trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1877 Can we make this method parameterized to reduce the number of casts required? E.g. private T Collection T getPartition... We might have to do something like StringgetPartition... when making the call though. - Paul On 2011-06-16 23:30:02, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-16 23:30:02) Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
[jira] [Commented] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050847#comment-13050847 ] Paul Yang commented on HIVE-2219: - Can you make a reviewboard instance? Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2219.1.patch The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2213: Optimize get_partition_names_ps()
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/#review804 --- You can do this here or in a separate JIRA, but can you update get_partitions_ps() using a similar technique? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1753 Can you refactor with the above function since they are similar? trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java https://reviews.apache.org/r/878/#comment1754 Same here trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/878/#comment1755 To be consistent with the other method, maybe call this listPartitionNamesPs? trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java https://reviews.apache.org/r/878/#comment1756 Combine with above - Paul On 2011-06-10 07:05:56, Sohan Jain wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/878/ --- (Updated 2011-06-10 07:05:56) Review request for hive and Paul Yang. Summary --- If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. This addresses bug HIVE-2213. https://issues.apache.org/jira/browse/HIVE-2213 Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205 trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205 Diff: https://reviews.apache.org/r/878/diff Testing --- Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore. Thanks, Sohan
[jira] [Commented] (HIVE-2147) Add api to send / receive message to metastore
[ https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047468#comment-13047468 ] Paul Yang commented on HIVE-2147: - I agree with John's suggestion for PARTITION_EVENTS. For this event table, when will rows be dropped? Also, for when partitions are represented using a string, we've followed the convention that they are called partition names. Can we use that for MPartitionSet? Since MPartitionSet.partVals is a string, we should make it indexed, much like partitionName for the PARTITION table. Add api to send / receive message to metastore -- Key: HIVE-2147 URL: https://issues.apache.org/jira/browse/HIVE-2147 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: api-without-thrift.patch, hive_2147-2.patch This is follow-up work on HIVE-2038. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1595: Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks Sohan! job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain Attachments: Hive-1595.1.patch, Hive-1595.2.patch For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045134#comment-13045134 ] Paul Yang commented on HIVE-1595: - +1 Will test and commit. job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain Attachments: Hive-1595.1.patch, Hive-1595.2.patch For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2029) MetaStore ConnectionURL updates need to trigger creation of Default DB if it doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044032#comment-13044032 ] Paul Yang commented on HIVE-2029: - Can you elaborate on how this retry feature works in datanucleus 3.0? The case that could be handled with the URL hook is as follows - a db host goes down. A failover is performed and a replica on a different host is promoted to be the new master. Using the hook, the client is able to re-execute the query on the new host and the Hive query succeeds without failure. Would it be possible to implement something similar in datanucleus 3.0? MetaStore ConnectionURL updates need to trigger creation of Default DB if it doesn't exist -- Key: HIVE-2029 URL: https://issues.apache.org/jira/browse/HIVE-2029 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Carl Steinbach Attachments: hive_2029.patch HIVE-1219 defined the JDOConnectionURLHook plugin, and integrated this feature into HiveMetaStore. On MetaStore operation failures, this plugin is used to update the metastore ConnectionURL configuration property. Currently this update triggers the reinitialization of the underlying JDO PersistenceManager, but it does not trigger checks to see if the default database exists, nor will it create the default database if it does not exist. It needs to do both. This ticket also covers removing the 'hive.metastore.force.reload.conf' property from HiveConf and HiveMetaStore. This property should not have been added in the first place since its sole purpose is to facilitate testing of the JDOConnectionURLHook mechanism by unnaturally forcing reinitialization of the PersistenceManager. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1595: Status: Open (was: Patch Available) Looks good, but can you remove the changes to readme? job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain Attachments: Hive-1595.1.patch For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang reassigned HIVE-1595: --- Assignee: Sohan Jain (was: Paul Yang) job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2153) Stats JDBC LIKE queries should escape '_' and '%'
[ https://issues.apache.org/jira/browse/HIVE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2153: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed. Thanks Ning! Stats JDBC LIKE queries should escape '_' and '%' - Key: HIVE-2153 URL: https://issues.apache.org/jira/browse/HIVE-2153 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2153.2.patch, HIVE-2153.patch DELETE /* Hive stats aggregation: org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsAggregator */ FROM PARTITION_STAT_TBL WHERE ID LIKE 'hdfs://dfsnode:9000/tmp/hive-root/hive_2011-05-09_04-30-28_586_4184342157898880918/-ext-1/ds=2011-05-08/table_name=dim_page_to_user_suggest_assoc/%' It is a prefix query but the '_' in the ID column should be escaped. The same applies to '%' if they appear in ID. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2153) Stats JDBC LIKE queries should escape '_' and '%'
[ https://issues.apache.org/jira/browse/HIVE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030981#comment-13030981 ] Paul Yang commented on HIVE-2153: - +1 Will test and commit Stats JDBC LIKE queries should escape '_' and '%' - Key: HIVE-2153 URL: https://issues.apache.org/jira/browse/HIVE-2153 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2153.2.patch, HIVE-2153.patch DELETE /* Hive stats aggregation: org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsAggregator */ FROM PARTITION_STAT_TBL WHERE ID LIKE 'hdfs://dfsnode:9000/tmp/hive-root/hive_2011-05-09_04-30-28_586_4184342157898880918/-ext-1/ds=2011-05-08/table_name=dim_page_to_user_suggest_assoc/%' It is a prefix query but the '_' in the ID column should be escaped. The same applies to '%' if they appear in ID. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2028) Performance instruments for client side execution
[ https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2028: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed. Thanks Ning! Performance instruments for client side execution - Key: HIVE-2028 URL: https://issues.apache.org/jira/browse/HIVE-2028 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2028.2.patch, HIVE-2028.3.patch, HIVE-2028.patch Hive client side execution could sometimes takes a long time. This task is to instrument the client side code to measure the time spent in the most likely expensive components. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2061) Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility
[ https://issues.apache.org/jira/browse/HIVE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008363#comment-13008363 ] Paul Yang commented on HIVE-2061: - Looks good, will test and commit. Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility -- Key: HIVE-2061 URL: https://issues.apache.org/jira/browse/HIVE-2061 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor Attachments: HIVE-2061.patch We have seen a use case where in the user's script, it run 'add jar hive_contrib.jar'. Since Hive has moved the jar file to be hive-contrib-{version}.jar, it introduced backward incompatibility. If we as the user to change the script and when Hive upgrade version again, the user need to change the script again. Creating a symlink seems to be the best solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2061) Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility
[ https://issues.apache.org/jira/browse/HIVE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008585#comment-13008585 ] Paul Yang commented on HIVE-2061: - Committed. Thanks Ning! Create a hive_contrib.jar symlink to hive-contrib-{version}.jar for backward compatibility -- Key: HIVE-2061 URL: https://issues.apache.org/jira/browse/HIVE-2061 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Priority: Minor Attachments: HIVE-2061.patch We have seen a use case where in the user's script, it run 'add jar hive_contrib.jar'. Since Hive has moved the jar file to be hive-contrib-{version}.jar, it introduced backward incompatibility. If we as the user to change the script and when Hive upgrade version again, the user need to change the script again. Creating a symlink seems to be the best solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2028) Performance instruments for client side execution
[ https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007625#comment-13007625 ] Paul Yang commented on HIVE-2028: - In PerfLogEnd(): {code} sb.append(/); {code} Shouldn't this be a since this is a close tag? Performance instruments for client side execution - Key: HIVE-2028 URL: https://issues.apache.org/jira/browse/HIVE-2028 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2028.2.patch, HIVE-2028.patch Hive client side execution could sometimes takes a long time. This task is to instrument the client side code to measure the time spent in the most likely expensive components. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2028) Performance instruments for client side execution
[ https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007759#comment-13007759 ] Paul Yang commented on HIVE-2028: - +1 Will test and commit. Performance instruments for client side execution - Key: HIVE-2028 URL: https://issues.apache.org/jira/browse/HIVE-2028 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2028.2.patch, HIVE-2028.3.patch, HIVE-2028.patch Hive client side execution could sometimes takes a long time. This task is to instrument the client side code to measure the time spent in the most likely expensive components. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1918: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed. Thanks Krishna! Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Fix For: 0.8.0 Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006582#comment-13006582 ] Paul Yang commented on HIVE-1918: - +1 Looks good, will test and commit Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2022) Making JDO thread-safe by default
[ https://issues.apache.org/jira/browse/HIVE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002184#comment-13002184 ] Paul Yang commented on HIVE-2022: - Apologies for the build break - Ning and I are looking into fixing some issues with my build environment. Making JDO thread-safe by default - Key: HIVE-2022 URL: https://issues.apache.org/jira/browse/HIVE-2022 Project: Hive Issue Type: Bug Components: Configuration, Metastore Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2022.patch If there are multiple thread accessing metastore concurrently, there are cases that JDO threw exceptions because of concurrent access of HashMap inside JDO. Setting javax.jdo.option.Multithreaded to true solves this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2025: Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/464/#review297 --- Ship it! Looks good to me - will test and commit. - Paul On 2011-03-03 13:46:55, Carl Steinbach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/464/ --- (Updated 2011-03-03 13:46:55) Review request for hive. Summary --- Review request for HIVE-2025. This addresses bugs HIVE-2022 and HIVE-2025. https://issues.apache.org/jira/browse/HIVE-2022 https://issues.apache.org/jira/browse/HIVE-2025 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1076530 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1076530 Diff: https://reviews.apache.org/r/464/diff Testing --- Thanks, Carl
[jira] Updated: (HIVE-2022) Making JDO thread-safe by default
[ https://issues.apache.org/jira/browse/HIVE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2022: Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed. Thanks Ning! Making JDO thread-safe by default - Key: HIVE-2022 URL: https://issues.apache.org/jira/browse/HIVE-2022 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2022.patch If there are multiple thread accessing metastore concurrently, there are cases that JDO threw exceptions because of concurrent access of HashMap inside JDO. Setting javax.jdo.option.Multithreaded to true solves this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2022) Making JDO thread-safe by default
[ https://issues.apache.org/jira/browse/HIVE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001666#comment-13001666 ] Paul Yang commented on HIVE-2022: - @Mac - sounds like a good idea. I'll backport to 0.7. Making JDO thread-safe by default - Key: HIVE-2022 URL: https://issues.apache.org/jira/browse/HIVE-2022 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.8.0 Attachments: HIVE-2022.patch If there are multiple thread accessing metastore concurrently, there are cases that JDO threw exceptions because of concurrent access of HashMap inside JDO. Setting javax.jdo.option.Multithreaded to true solves this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001281#comment-13001281 ] Paul Yang commented on HIVE-1941: - +1 tests passed support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.8.0 Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, HIVE-1941.4.patch, HIVE-1941.5.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1941: Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks John! support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.8.0 Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, HIVE-1941.4.patch, HIVE-1941.5.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2022) Making JDO thread-safe by default
[ https://issues.apache.org/jira/browse/HIVE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001283#comment-13001283 ] Paul Yang commented on HIVE-2022: - +1 Will commit once tests pass. Making JDO thread-safe by default - Key: HIVE-2022 URL: https://issues.apache.org/jira/browse/HIVE-2022 Project: Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-2022.patch If there are multiple thread accessing metastore concurrently, there are cases that JDO threw exceptions because of concurrent access of HashMap inside JDO. Setting javax.jdo.option.Multithreaded to true solves this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HIVE-1920) DESCRIBE with comments is difficult to read
[ https://issues.apache.org/jira/browse/HIVE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang reassigned HIVE-1920: --- Assignee: (was: Paul Yang) DESCRIBE with comments is difficult to read --- Key: HIVE-1920 URL: https://issues.apache.org/jira/browse/HIVE-1920 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.7.0 Reporter: Paul Yang Priority: Minor Attachments: HIVE-1920.1.nocomment.patch When DESCRIBE is run, comments for columns are displayed next to the column type. A problem with this is that if the comment contains line breaks, it is difficult to differentiate the columns from the comments and is difficult to read. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2002) Expand exceptions caught for metastore operations
[ https://issues.apache.org/jira/browse/HIVE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2002: Assignee: Paul Yang Status: Patch Available (was: Open) Expand exceptions caught for metastore operations - Key: HIVE-2002 URL: https://issues.apache.org/jira/browse/HIVE-2002 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-2002.1.patch Currently, HiveMetaStore.executeWithRetry() catches two classes of exceptions and retries the metastore call when such exceptions occur. However, it does not catch some exceptions that could benefit from a retry: {code} Failed with exception javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : The MySQL server is running with the --read-only option so it cannot execute this statement NestedThrowables: java.sql.SQLException: The MySQL server is running with the --read-only option so it cannot execute this statement FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask {code} In this case, the MySQL server could be temporarily in a read-only mode, and a later DB call may succeed. To handle these situations, this JIRA proposes to expand the class of exceptions caught for retries. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2002) Expand exceptions caught for metastore operations
[ https://issues.apache.org/jira/browse/HIVE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2002: Attachment: HIVE-2002.1.patch Expand exceptions caught for metastore operations - Key: HIVE-2002 URL: https://issues.apache.org/jira/browse/HIVE-2002 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0 Reporter: Paul Yang Priority: Minor Attachments: HIVE-2002.1.patch Currently, HiveMetaStore.executeWithRetry() catches two classes of exceptions and retries the metastore call when such exceptions occur. However, it does not catch some exceptions that could benefit from a retry: {code} Failed with exception javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : The MySQL server is running with the --read-only option so it cannot execute this statement NestedThrowables: java.sql.SQLException: The MySQL server is running with the --read-only option so it cannot execute this statement FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask {code} In this case, the MySQL server could be temporarily in a read-only mode, and a later DB call may succeed. To handle these situations, this JIRA proposes to expand the class of exceptions caught for retries. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1941: support explicit view partitioning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/390/#review256 --- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/390/#comment502 What's the meaning of a hidden virtual column? - Paul On 2011-02-11 09:17:35, John Sichi wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/390/ --- (Updated 2011-02-11 09:17:35) Review request for hive. Summary --- review request from JVS This addresses bug HIVE-1941. https://issues.apache.org/jira/browse/HIVE-1941 Diffs - http://svn.apache.org/repos/asf/hive/trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure2.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure3.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure4.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure6.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/alter_view_failure7.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/analyze_view.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure6.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure7.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure8.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_view_failure9.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/create_view_partitioned.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure.q.out 1069561 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure2.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure3.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure4.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure6.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/alter_view_failure7.q.out PRE-CREATION
[jira] Commented: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998832#comment-12998832 ] Paul Yang commented on HIVE-1941: - Patch looks good once we have the aforementioned changes. support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, HIVE-1941.4.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1941: Status: Open (was: Patch Available) support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, HIVE-1941.4.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1918: Add export/import facilities to the hive system
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/339/#review255 --- ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java https://reviews.apache.org/r/339/#comment500 Can we avoid nesting the ternary operator? It makes the code a little confusing. There are several instances of this in the diff, but I've just highlighted the first one. ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java https://reviews.apache.org/r/339/#comment501 If we go with the route of having auto-generated code, then we might want to look into having JDO to handle this for us. Datanucleus/JDO has an option to persist to an XML file that might be applicable for this use case. I would agree that some discretion is required in picking fields to serialize for import/export, but the fear is that adding a field now will require many changes. - Paul On 2011-02-04 17:13:17, Carl Steinbach wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/339/ --- (Updated 2011-02-04 17:13:17) Review request for hive. Summary --- Review for HIVE-1918. This addresses bug HIVE-1918. https://issues.apache.org/jira/browse/HIVE-1918 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7e5e19f conf/hive-default.xml 46156c0 ql/src/java/org/apache/hadoop/hive/ql/exec/CopyTask.java 30ea670 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 6fea990 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e47992a ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5f78082 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java b7c51ae ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java d8442b2 ql/src/java/org/apache/hadoop/hive/ql/parse/ErrorMsg.java 01eef69 ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g c5574b0 ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 15e7a13 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 7655154 ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java e7be269 ql/src/java/org/apache/hadoop/hive/ql/plan/CopyWork.java 7a62ec7 ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java e484fe2 ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java d5bccae ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveUtils.java PRE-CREATION ql/src/test/queries/clientnegative/exim_00_unsupported_schema.q PRE-CREATION ql/src/test/queries/clientnegative/exim_01_nonpart_over_loaded.q PRE-CREATION ql/src/test/queries/clientnegative/exim_02_all_part_over_overlap.q PRE-CREATION ql/src/test/queries/clientnegative/exim_03_nonpart_noncompat_colschema.q PRE-CREATION ql/src/test/queries/clientnegative/exim_04_nonpart_noncompat_colnumber.q PRE-CREATION ql/src/test/queries/clientnegative/exim_05_nonpart_noncompat_coltype.q PRE-CREATION ql/src/test/queries/clientnegative/exim_06_nonpart_noncompat_storage.q PRE-CREATION ql/src/test/queries/clientnegative/exim_07_nonpart_noncompat_ifof.q PRE-CREATION ql/src/test/queries/clientnegative/exim_08_nonpart_noncompat_serde.q PRE-CREATION ql/src/test/queries/clientnegative/exim_09_nonpart_noncompat_serdeparam.q PRE-CREATION ql/src/test/queries/clientnegative/exim_10_nonpart_noncompat_bucketing.q PRE-CREATION ql/src/test/queries/clientnegative/exim_11_nonpart_noncompat_sorting.q PRE-CREATION ql/src/test/queries/clientnegative/exim_12_nonnative_export.q PRE-CREATION ql/src/test/queries/clientnegative/exim_13_nonnative_import.q PRE-CREATION ql/src/test/queries/clientnegative/exim_14_nonpart_part.q PRE-CREATION ql/src/test/queries/clientnegative/exim_15_part_nonpart.q PRE-CREATION ql/src/test/queries/clientnegative/exim_16_part_noncompat_schema.q PRE-CREATION ql/src/test/queries/clientnegative/exim_17_part_spec_underspec.q PRE-CREATION ql/src/test/queries/clientnegative/exim_18_part_spec_missing.q PRE-CREATION ql/src/test/queries/clientnegative/exim_19_external_over_existing.q PRE-CREATION ql/src/test/queries/clientnegative/exim_20_managed_location_over_existing.q PRE-CREATION ql/src/test/queries/clientnegative/exim_21_part_managed_external.q PRE-CREATION ql/src/test/queries/clientpositive/exim_00_nonpart_empty.q PRE-CREATION
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998129#comment-12998129 ] Paul Yang commented on HIVE-1918: - Made a couple of comments on reviewboard. Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2001) Add inputs and outputs to authorization ddls.
[ https://issues.apache.org/jira/browse/HIVE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2001: Description: When permissions are changed for a table/partition, the respective object should be present in the read/write entities for hooks to act on. Add inputs and outputs to authorization ddls. - Key: HIVE-2001 URL: https://issues.apache.org/jira/browse/HIVE-2001 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-2001.patch When permissions are changed for a table/partition, the respective object should be present in the read/write entities for hooks to act on. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2001) Add inputs and outputs to authorization ddls.
[ https://issues.apache.org/jira/browse/HIVE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998135#comment-12998135 ] Paul Yang commented on HIVE-2001: - +1 will test and commit Add inputs and outputs to authorization ddls. - Key: HIVE-2001 URL: https://issues.apache.org/jira/browse/HIVE-2001 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-2001.patch When permissions are changed for a table/partition, the respective object should be present in the read/write entities for hooks to act on. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2001) Add inputs and outputs to authorization DDL commands
[ https://issues.apache.org/jira/browse/HIVE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-2001: Component/s: Query Processor Affects Version/s: 0.8.0 Summary: Add inputs and outputs to authorization DDL commands (was: Add inputs and outputs to authorization ddls.) Add inputs and outputs to authorization DDL commands Key: HIVE-2001 URL: https://issues.apache.org/jira/browse/HIVE-2001 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0 Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive-2001.patch When permissions are changed for a table/partition, the respective object should be present in the read/write entities for hooks to act on. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2002) Expand exceptions caught for metastore operations
Expand exceptions caught for metastore operations - Key: HIVE-2002 URL: https://issues.apache.org/jira/browse/HIVE-2002 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0 Reporter: Paul Yang Priority: Minor Currently, HiveMetaStore.executeWithRetry() catches two classes of exceptions and retries the metastore call when such exceptions occur. However, it does not catch some exceptions that could benefit from a retry: {code} Failed with exception javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : The MySQL server is running with the --read-only option so it cannot execute this statement NestedThrowables: java.sql.SQLException: The MySQL server is running with the --read-only option so it cannot execute this statement FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask {code} In this case, the MySQL server could be temporarily in a read-only mode, and a later DB call may succeed. To handle these situations, this JIRA proposes to expand the class of exceptions caught for retries. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12996593#comment-12996593 ] Paul Yang commented on HIVE-1941: - @John - Yes, that's what I meant. I'll take a look at the whole patch as well. support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, HIVE-1941.4.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995666#comment-12995666 ] Paul Yang commented on HIVE-1941: - It looks like it's possible with the current thrift add_partition() method to create a partition for a view with a non-null SD/location. Can we put in a check to guard against this case? Other than that, it looks good from the metastore/replication side. support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, HIVE-1941.4.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995668#comment-12995668 ] Paul Yang commented on HIVE-1941: - Similarly, we should handle the case when calling append_partition() on a view. support explicit view partitioning -- Key: HIVE-1941 URL: https://issues.apache.org/jira/browse/HIVE-1941 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, HIVE-1941.4.patch Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-1995) Mismatched open/commit transaction calls when using get_partition()
Mismatched open/commit transaction calls when using get_partition() --- Key: HIVE-1995 URL: https://issues.apache.org/jira/browse/HIVE-1995 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Paul Yang Priority: Minor Nested executeWithRetry() calls caused by using HiveMetaStore.get_partition() can result in mis-matched open/commit calls. Fixes the same issue as described in HIVE-1760. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1995) Mismatched open/commit transaction calls when using get_partition()
[ https://issues.apache.org/jira/browse/HIVE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1995: Attachment: HIVE-1995.1.patch Mismatched open/commit transaction calls when using get_partition() --- Key: HIVE-1995 URL: https://issues.apache.org/jira/browse/HIVE-1995 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Paul Yang Priority: Minor Attachments: HIVE-1995.1.patch Nested executeWithRetry() calls caused by using HiveMetaStore.get_partition() can result in mis-matched open/commit calls. Fixes the same issue as described in HIVE-1760. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira