[jira] [Created] (HIVE-2366) Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table
Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table --- Key: HIVE-2366 URL: https://issues.apache.org/jira/browse/HIVE-2366 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain The upgrade scripts for the hive metastore in HIVE-2246 do not upgrade the indexes. They also need to rename the old COLUMNS table after migration so that old clients will not accidentally access the COLUMNS table. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2366) Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table
[ https://issues.apache.org/jira/browse/HIVE-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2366: - Attachment: HIVE-2366.1.patch Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table --- Key: HIVE-2366 URL: https://issues.apache.org/jira/browse/HIVE-2366 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2366.1.patch The upgrade scripts for the hive metastore in HIVE-2246 do not upgrade the indexes. They also need to rename the old COLUMNS table after migration so that old clients will not accidentally access the COLUMNS table. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2367) Indexes' storage descriptors' columns are not deduped, and altering an index leaves behind an unused storage descriptor
Indexes' storage descriptors' columns are not deduped, and altering an index leaves behind an unused storage descriptor --- Key: HIVE-2367 URL: https://issues.apache.org/jira/browse/HIVE-2367 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain The metastore migration by HIVE-2246 does not dedupe the COLUMNS information for Indexes. That is, the IDXS table has a Storage Descriptor that always points to a new Column Descriptor, which is unlikely shared by any other storage descriptor. Therefore, when altering an index, a new storage Descriptor and column descriptor are created. No other objects will reference the old storage descriptor and column descriptor, but they will persist in the metastore db. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long
[ https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2368: - Attachment: HIVE-2368.1.patch Determining whether a Column Descriptor is unused may take too long --- Key: HIVE-2368 URL: https://issues.apache.org/jira/browse/HIVE-2368 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Attachments: HIVE-2368.1.patch To determine if a column descriptor is unused, we call listStorageDescriptorsWithCD(), which may return a big list of SDs. This can severely slow down dropping partitions. We can add a maximum number of SDs to return, and just ask for 1 SD, since we are just doing an existential check. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2246: - Attachment: HIVE-2246.8.patch Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, HIVE-2246.8.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2356) Fix udtf_explode.q and udf_explode.q test failures
[ https://issues.apache.org/jira/browse/HIVE-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081333#comment-13081333 ] Sohan Jain commented on HIVE-2356: -- @Carl: out of curiosity, what was the fix here? I was trying to trace this error for a little while. Fix udtf_explode.q and udf_explode.q test failures -- Key: HIVE-2356 URL: https://issues.apache.org/jira/browse/HIVE-2356 Project: Hive Issue Type: Bug Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-2356-fix-explode.1.patch.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2246: - Attachment: HIVE-2246.4.patch Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2322: - Attachment: HIVE-2322.4.patch fixed the broken test cases Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, HIVE-2322.4.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2319) Calling alter_table after changing partition comment throws an exception
[ https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2319: - Attachment: HIVE-2319.4.patch Calling alter_table after changing partition comment throws an exception Key: HIVE-2319 URL: https://issues.apache.org/jira/browse/HIVE-2319 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2319.2.patch, HIVE-2319.3.patch, HIVE-2319.4.patch Altering a table's partition key comments raises an InvalidOperationException. The partition key name and type should not be mutable, but the comment should be able to get changed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079537#comment-13079537 ] Sohan Jain commented on HIVE-2322: -- Yes, looks like some of the output.q files were updated and now conflicting. I've been re-running the test suite and re-generating them. Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2338) Alter table always throws an unhelpful error on failure
[ https://issues.apache.org/jira/browse/HIVE-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2338: - Status: Patch Available (was: Open) Alter table always throws an unhelpful error on failure --- Key: HIVE-2338 URL: https://issues.apache.org/jira/browse/HIVE-2338 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Priority: Minor Attachments: HIVE-2338.1.patch Every failure in an alter table function always return a MetaException. When altering tables and catching exceptions, we throw a MetaException in the finally part of a try-catch-finally block, which overrides any other exceptions thrown. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2319) Calling alter_table after changing partition comment throws an exception
Calling alter_table after changing partition comment throws an exception Key: HIVE-2319 URL: https://issues.apache.org/jira/browse/HIVE-2319 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Altering a table's partition key comments raises an InvalidOperationException. The partition key name and type should not be mutable, but the comment should be able to get changed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2319) Calling alter_table after changing partition comment throws an exception
[ https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2319: - Assignee: Sohan Jain Status: Patch Available (was: Open) Calling alter_table after changing partition comment throws an exception Key: HIVE-2319 URL: https://issues.apache.org/jira/browse/HIVE-2319 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2319.2.patch Altering a table's partition key comments raises an InvalidOperationException. The partition key name and type should not be mutable, but the comment should be able to get changed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2319) Calling alter_table after changing partition comment throws an exception
[ https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2319: - Attachment: HIVE-2319.2.patch Calling alter_table after changing partition comment throws an exception Key: HIVE-2319 URL: https://issues.apache.org/jira/browse/HIVE-2319 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sohan Jain Attachments: HIVE-2319.2.patch Altering a table's partition key comments raises an InvalidOperationException. The partition key name and type should not be mutable, but the comment should be able to get changed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes
[ https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2322: - Attachment: HIVE-2322.1.patch Add ColumnarSerDe to the list of native SerDes -- Key: HIVE-2322 URL: https://issues.apache.org/jira/browse/HIVE-2322 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2322.1.patch We store metadata about ColumnarSerDes in the metastore, so it should be considered a native SerDe. Then, column information can be retrieved from the metastore instead of from deserialization. Currently, for non-native SerDes, column comments are only shown as from deserializer. Adding ColumnarSerDe to the list of native SerDes will persist column comments. See HIVE-2171 for persisting the column comments of custom SerDes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2226: - Attachment: HIVE-2226.4.patch include auto-gen thrift files Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2246: - Description: Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. was: We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Tags: metastore, schema, JDO Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2246: - Attachment: HIVE-2246.3.patch Adding some missing files that I forgot to svn add Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2226: - Attachment: HIVE-2226.3.patch Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2275) Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions
Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions -- Key: HIVE-2275 URL: https://issues.apache.org/jira/browse/HIVE-2275 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain HIVE-2219 applied an incorrect patch that fails unit tests. This patch reverts those changes and adds the intended changes to improve the efficiency of dropping multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2275) Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2275: - Attachment: HIVE-2275.1.patch Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions -- Key: HIVE-2275 URL: https://issues.apache.org/jira/browse/HIVE-2275 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2275.1.patch HIVE-2219 applied an incorrect patch that fails unit tests. This patch reverts those changes and adds the intended changes to improve the efficiency of dropping multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2275) Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2275: - Status: Patch Available (was: Open) Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions -- Key: HIVE-2275 URL: https://issues.apache.org/jira/browse/HIVE-2275 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2275.1.patch HIVE-2219 applied an incorrect patch that fails unit tests. This patch reverts those changes and adds the intended changes to improve the efficiency of dropping multiple partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062167#comment-13062167 ] Sohan Jain commented on HIVE-2219: -- Ok, please refer to HIVE-2275 Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners
[ https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2194: - Attachment: HIVE-2194.4.patch Add actions for alter table and alter partition events for metastore event listeners Key: HIVE-2194 URL: https://issues.apache.org/jira/browse/HIVE-2194 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Fix For: 0.8.0 Attachments: HIVE-2194.1.patch, HIVE-2194.3.patch, HIVE-2194.4.patch HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2276) Fix Inconsistency between RB and JIRA patches for HIVE-2194
Fix Inconsistency between RB and JIRA patches for HIVE-2194 --- Key: HIVE-2276 URL: https://issues.apache.org/jira/browse/HIVE-2276 Project: Hive Issue Type: Bug Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2276.1.patch The RB and JIRA patches for HIVE-2194 were out of sync. An outdated patch for HIVE-2194 was committed. This patch updates that patch to include the changes from RB. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2276) Fix Inconsistency between RB and JIRA patches for HIVE-2194
[ https://issues.apache.org/jira/browse/HIVE-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2276: - Attachment: HIVE-2276.1.patch Fix Inconsistency between RB and JIRA patches for HIVE-2194 --- Key: HIVE-2276 URL: https://issues.apache.org/jira/browse/HIVE-2276 Project: Hive Issue Type: Bug Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2276.1.patch The RB and JIRA patches for HIVE-2194 were out of sync. An outdated patch for HIVE-2194 was committed. This patch updates that patch to include the changes from RB. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2276) Fix Inconsistency between RB and JIRA patches for HIVE-2194
[ https://issues.apache.org/jira/browse/HIVE-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2276: - Status: Patch Available (was: Open) Fix Inconsistency between RB and JIRA patches for HIVE-2194 --- Key: HIVE-2276 URL: https://issues.apache.org/jira/browse/HIVE-2276 Project: Hive Issue Type: Bug Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2276.1.patch The RB and JIRA patches for HIVE-2194 were out of sync. An outdated patch for HIVE-2194 was committed. This patch updates that patch to include the changes from RB. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2256) Better error message in CLI on invalid column name
[ https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2256: - Status: Patch Available (was: Open) Better error message in CLI on invalid column name -- Key: HIVE-2256 URL: https://issues.apache.org/jira/browse/HIVE-2256 Project: Hive Issue Type: Improvement Components: CLI, Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2256.1.patch, HIVE-2256.2.patch In the CLI, if a user inputs an incorrect column name, we currently just print the bad column name in the query. Typically, the user needs to describe the table to figure out the correct column. This patch prints out a list of valid column and partition names in the table when a user inputs an incorrect column name. e.g., {{Error in semantic analysis: Invalid table alias or column reference 'col_does_not_exist' (possible column names are: col1, col2)}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2256) Better error message in CLI on invalid column name
Better error message in CLI on invalid column name -- Key: HIVE-2256 URL: https://issues.apache.org/jira/browse/HIVE-2256 Project: Hive Issue Type: Improvement Components: CLI, Query Processor Reporter: Sohan Jain In the CLI, if a user inputs an incorrect column name, we currently just print the bad column name in the query. Typically, the user needs to describe the table to figure out the correct column. This patch prints out a list of valid column and partition names in the table when a user inputs an incorrect column name. e.g., {{Error in semantic analysis: Invalid table alias or column reference 'col_does_not_exist' (possible column names are: col1, col2)}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2256) Better error message in CLI on invalid column name
[ https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2256: - Attachment: HIVE-2256.1.patch The patch will probably need to edit test cases as well to account for the new error messages. Better error message in CLI on invalid column name -- Key: HIVE-2256 URL: https://issues.apache.org/jira/browse/HIVE-2256 Project: Hive Issue Type: Improvement Components: CLI, Query Processor Reporter: Sohan Jain Attachments: HIVE-2256.1.patch In the CLI, if a user inputs an incorrect column name, we currently just print the bad column name in the query. Typically, the user needs to describe the table to figure out the correct column. This patch prints out a list of valid column and partition names in the table when a user inputs an incorrect column name. e.g., {{Error in semantic analysis: Invalid table alias or column reference 'col_does_not_exist' (possible column names are: col1, col2)}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2256) Better error message in CLI on invalid column name
[ https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain reassigned HIVE-2256: Assignee: Sohan Jain Better error message in CLI on invalid column name -- Key: HIVE-2256 URL: https://issues.apache.org/jira/browse/HIVE-2256 Project: Hive Issue Type: Improvement Components: CLI, Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2256.1.patch In the CLI, if a user inputs an incorrect column name, we currently just print the bad column name in the query. Typically, the user needs to describe the table to figure out the correct column. This patch prints out a list of valid column and partition names in the table when a user inputs an incorrect column name. e.g., {{Error in semantic analysis: Invalid table alias or column reference 'col_does_not_exist' (possible column names are: col1, col2)}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2256) Better error message in CLI on invalid column name
[ https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2256: - Attachment: HIVE-2256.2.patch -Updated the outputs for unit tests. -Fixed the error message to include the line number and position in the query of the invalid column. Better error message in CLI on invalid column name -- Key: HIVE-2256 URL: https://issues.apache.org/jira/browse/HIVE-2256 Project: Hive Issue Type: Improvement Components: CLI, Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2256.1.patch, HIVE-2256.2.patch In the CLI, if a user inputs an incorrect column name, we currently just print the bad column name in the query. Typically, the user needs to describe the table to figure out the correct column. This patch prints out a list of valid column and partition names in the table when a user inputs an incorrect column name. e.g., {{Error in semantic analysis: Invalid table alias or column reference 'col_does_not_exist' (possible column names are: col1, col2)}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions
[ https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2242: - Attachment: HIVE-2242.1.patch - use db.getPartitions instead of db.getPartition to accommodate partial specifications DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions - Key: HIVE-2242 URL: https://issues.apache.org/jira/browse/HIVE-2242 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2242.1.patch Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string);}} {{alter table test_table add partition (p1=1, p2=1);}} {{alter table test_table add partition (p1=1, p2=2);}} {{alter table test_table add partition (p1=2, p2=2);}} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions
[ https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2242: - Status: Patch Available (was: Open) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions - Key: HIVE-2242 URL: https://issues.apache.org/jira/browse/HIVE-2242 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2242.1.patch Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string);}} {{alter table test_table add partition (p1=1, p2=1);}} {{alter table test_table add partition (p1=1, p2=2);}} {{alter table test_table add partition (p1=2, p2=2);}} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions
DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions - Key: HIVE-2242 URL: https://issues.apache.org/jira/browse/HIVE-2242 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string); alter table test_table add partition (p1=1, p2=1); alter table test_table add partition (p1=1, p2=2); alter table test_table add partition (p1=2, p2=2); }} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions
[ https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2242: - Description: Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string);}} {{alter table test_table add partition (p1=1, p2=1);}} {{alter table test_table add partition (p1=1, p2=2);}} {{alter table test_table add partition (p1=2, p2=2);}} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 was: Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string); alter table test_table add partition (p1=1, p2=1); alter table test_table add partition (p1=1, p2=2); alter table test_table add partition (p1=2, p2=2); }} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions - Key: HIVE-2242 URL: https://issues.apache.org/jira/browse/HIVE-2242 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Currently, when dropping partitions, the DDL Semantic Analyzer only passes partitions that have a full specification to Pre Execution hooks. It should also include all matches from partial specifications. E.g., suppose you have a table {{create table test_table (a string) partitioned by (p1 string, p2 string);}} {{alter table test_table add partition (p1=1, p2=1);}} {{alter table test_table add partition (p1=1, p2=2);}} {{alter table test_table add partition (p1=2, p2=2);}} and you run {{alter table test_table drop partition(p1=1);}} Pre-execution hooks will not be passed any of the partitions. The expected behavior is for pre-execution hooks to get the WriteEntity's with the partitions p1=1/p2=1 and p1=1/p2=2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners
[ https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2194: - Attachment: HIVE-2194.3.patch Add actions for alter table and alter partition events for metastore event listeners Key: HIVE-2194 URL: https://issues.apache.org/jira/browse/HIVE-2194 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2194.1.patch, HIVE-2194.3.patch HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051333#comment-13051333 ] Sohan Jain commented on HIVE-2213: -- I'd also like to point one more thing out. The previous implementation of get_partitions_ps_with_auth() did not actually make use of the inputted user name or group name, nor did it set any auth privileges on the desired partitions. This patch adds authentication privileges, which unfortunately slows down get_partitions_ps_with_auth(), since we have to iterate through all of the partitions and set privileges before returning them. What is the desired behavior here? Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2213: - Status: Patch Available (was: Open) Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2213: - Attachment: HIVE-2213.3.patch -Fixed line that exceeded 100 chars Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2219: - Status: Open (was: Patch Available) Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2219.1.patch The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050874#comment-13050874 ] Sohan Jain commented on HIVE-2219: -- Ah sorry, after another round of testing, I realized this doesn't work correctly at all for partial partition specs! I will re-implement it and test again for speed / full correctness. Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2219.1.patch The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.
[ https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2226: - Attachment: HIVE-2226.1.patch Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc. --- Key: HIVE-2226 URL: https://issues.apache.org/jira/browse/HIVE-2226 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2226.1.patch Create a function called get_table_names_by_filter that returns a list of table names in a database that match a certain filter. The filter should operate similar to the one HIVE-1609. Initially, you should be able to prune the table list based on owner, retention, or table parameter key/values. The filtering should take place at the JDO level for efficiency/speed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2219) Make alter table drop partition more efficient
Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain reassigned HIVE-2219: Assignee: Sohan Jain Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2219) Make alter table drop partition more efficient
[ https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2219: - Attachment: HIVE-2219.1.patch Improves the time it takes to check whether a partition to delete exists in the lists of partitions. Overall improves the complexity to _O(m + n)_ Make alter table drop partition more efficient Key: HIVE-2219 URL: https://issues.apache.org/jira/browse/HIVE-2219 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2219.1.patch The current function dropTable() that handles dropping multiple partitions is somewhat inefficient. For each partition you want to drop, it loops through each partition in the table to see if the partition exists. This is an _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is the number of partitions in the table. The running time of this function can be improved, which is useful for tables with many partitions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2213) Optimize get_partition_names_ps()
Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()
[ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2213: - Attachment: HIVE-2213.1.patch Optimize get_partition_names_ps() - Key: HIVE-2213 URL: https://issues.apache.org/jira/browse/HIVE-2213 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2213.1.patch If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2188) Add get_table_objects_by_name() to Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046983#comment-13046983 ] Sohan Jain commented on HIVE-2188: -- Thank you, Carl. Add get_table_objects_by_name() to Hive MetaStore - Key: HIVE-2188 URL: https://issues.apache.org/jira/browse/HIVE-2188 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2188.1.patch, HIVE-2188.3.patch This function would get multiple tables from the hive metastore as opposed to just one at a time, saving round trip time to the metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2188) Add multi_get_table function in Hive Metastore
[ https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2188: - Attachment: HIVE-2188.3.patch Add multi_get_table function in Hive Metastore -- Key: HIVE-2188 URL: https://issues.apache.org/jira/browse/HIVE-2188 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Priority: Minor Attachments: HIVE-2188.1.patch, HIVE-2188.3.patch This function would get multiple tables from the hive metastore as opposed to just one at a time, saving round trip time to the metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners
Add actions for alter table and alter partition events for metastore event listeners Key: HIVE-2194 URL: https://issues.apache.org/jira/browse/HIVE-2194 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners
[ https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2194: - Status: Patch Available (was: Open) Add actions for alter table and alter partition events for metastore event listeners Key: HIVE-2194 URL: https://issues.apache.org/jira/browse/HIVE-2194 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2194.1.patch HIVE-2038 introduced the MetaStoreEventListener abstract class that defines actions to be performed after particular events on a metastore. Improve upon that class by adding events to be performed on alter table and alter partition actions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2188) Add multi_get_table function in Hive Metastore
[ https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2188: - Attachment: HIVE-2188.1.patch - Added multiGetTable function to the interface RawStore - Implemented it in ObjectStore imitating the SQL IN operator in JDO - added multi_get_table function to HiveMetaStore that is a wrapper to the RawStore function. Add multi_get_table function in Hive Metastore -- Key: HIVE-2188 URL: https://issues.apache.org/jira/browse/HIVE-2188 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Priority: Minor Attachments: HIVE-2188.1.patch This function would get multiple tables from the hive metastore as opposed to just one at a time, saving round trip time to the metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2188) Add multi_get_table function in Hive Metastore
[ https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2188: - Status: Patch Available (was: Open) Add multi_get_table function in Hive Metastore -- Key: HIVE-2188 URL: https://issues.apache.org/jira/browse/HIVE-2188 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Priority: Minor Attachments: HIVE-2188.1.patch This function would get multiple tables from the hive metastore as opposed to just one at a time, saving round trip time to the metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-1595: - Status: Patch Available (was: Open) Removed README edit and unnecessary diffs to the import statements job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain Attachments: Hive-1595.1.patch, Hive-1595.2.patch For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2188) Add multi_get_table function in Hive Metastore
Add multi_get_table function in Hive Metastore -- Key: HIVE-2188 URL: https://issues.apache.org/jira/browse/HIVE-2188 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Priority: Minor This function would get multiple tables from the hive metastore as opposed to just one at a time, saving round trip time to the metastore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-1595: - Attachment: Hive-1595.1.patch job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain Attachments: Hive-1595.1.patch For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct
[ https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-1595: - Status: Patch Available (was: Open) job name for alter table T archive partition P is not correct - Key: HIVE-1595 URL: https://issues.apache.org/jira/browse/HIVE-1595 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Sohan Jain Attachments: Hive-1595.1.patch For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which makes it difficult to identify -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira