[jira] [Commented] (HIVE-21917) COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
[ https://issues.apache.org/jira/browse/HIVE-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976163#comment-16976163 ] Craig Condit commented on HIVE-21917: - [~dkuzmenko], checkInterval appears to be used as both a local variable and a class variable. Is this intended? > COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs > > > Key: HIVE-21917 > URL: https://issues.apache.org/jira/browse/HIVE-21917 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.0, 3.1.1 >Reporter: Craig Condit >Assignee: Denys Kuzmenko >Priority: Major > Attachments: HIVE-21917.1.patch, HIVE-21917.2.patch, > HIVE-21917.3.patch > > > The Initiator thread in the metastore repeatedly loops over entries in the > COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might > need to be compacted. However, entries are never removed from this table > except by a completed Compactor run. > In a cluster where most tables / partitions are write-once read-many, this > results in stale entries in this table never being cleaned up. In a small > test cluster, we have observed approximately 45k entries in this table > (virtually equal to the number of partitions in the cluster) while < 100 of > these tables have delta files at all. Since most of the tables will never get > enough writes to trigger a compaction (and in fact have only ever been > written to once), the initiator thread keeps trying to evaluate them on every > loop. > On this test cluster, it takes approximately 10 minutes to loop through all > the entries and results in severe performance degradation on metastore > operations. With the default run timing of 5 minutes, the initiator basically > never stops running. > On a production cluster with 2M partitions, this would be a non-starter. > The initiator thread should proactively remove entries from > COMPLETED_TXN_COMPONENTS when it determines that a compaction is not needed, > so that they are not evaluated again on the next loop. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21917) COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
[ https://issues.apache.org/jira/browse/HIVE-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit updated HIVE-21917: Summary: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs (was: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compator runs) > COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs > > > Key: HIVE-21917 > URL: https://issues.apache.org/jira/browse/HIVE-21917 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.1.0, 3.1.1 >Reporter: Craig Condit >Priority: Major > > The Initiator thread in the metastore repeatedly loops over entries in the > COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might > need to be compacted. However, entries are never removed from this table > except by a completed Compactor run. > In a cluster where most tables / partitions are write-once read-many, this > results in stale entries in this table never being cleaned up. In a small > test cluster, we have observed approximately 45k entries in this table > (virtually equal to the number of partitions in the cluster) while < 100 of > these tables have delta files at all. Since most of the tables will never get > enough writes to trigger a compaction (and in fact have only ever been > written to once), the initiator thread keeps trying to evaluate them on every > loop. > On this test cluster, it takes approximately 10 minutes to loop through all > the entries and results in severe performance degradation on metastore > operations. With the default run timing of 5 minutes, the initiator basically > never stops running. > On a production cluster with 2M partitions, this would be a non-starter. > The initiator thread should proactively remove entries from > COMPLETED_TXN_COMPONENTS when it determines that a compaction is not needed, > so that they are not evaluated again on the next loop. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-21631) Enhance metastore API to allow bulk-loading materialized views
[ https://issues.apache.org/jira/browse/HIVE-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reassigned HIVE-21631: --- Assignee: Jesus Camacho Rodriguez > Enhance metastore API to allow bulk-loading materialized views > -- > > Key: HIVE-21631 > URL: https://issues.apache.org/jira/browse/HIVE-21631 > Project: Hive > Issue Type: Sub-task > Components: Materialized views, Metastore >Affects Versions: 3.2.0, 3.1.1 >Reporter: Craig Condit >Assignee: Jesus Camacho Rodriguez >Priority: Major > > Currently, every query in HS2 results in a metastore call per database to > retrieve all materialized views. This causes severe performance degradation > on multi-tenant clusters with thousands of databases (very similar to how the > old get_function() metastore call didn't scale). > We should add a metastore call which can retrieve all materialized view > definitions at once (for all DBs) so that we don't have to make thousands of > metastore calls per query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-21631) Enhance metastore API to allow bulk-loading materialized views
[ https://issues.apache.org/jira/browse/HIVE-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit updated HIVE-21631: Issue Type: Sub-task (was: Improvement) Parent: HIVE-14484 > Enhance metastore API to allow bulk-loading materialized views > -- > > Key: HIVE-21631 > URL: https://issues.apache.org/jira/browse/HIVE-21631 > Project: Hive > Issue Type: Sub-task > Components: Materialized views, Metastore >Affects Versions: 3.2.0, 3.1.1 >Reporter: Craig Condit >Priority: Major > > Currently, every query in HS2 results in a metastore call per database to > retrieve all materialized views. This causes severe performance degradation > on multi-tenant clusters with thousands of databases (very similar to how the > old get_function() metastore call didn't scale). > We should add a metastore call which can retrieve all materialized view > definitions at once (for all DBs) so that we don't have to make thousands of > metastore calls per query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HIVE-14484) Extensions for initial materialized views implementation
[ https://issues.apache.org/jira/browse/HIVE-14484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reopened HIVE-14484: - > Extensions for initial materialized views implementation > > > Key: HIVE-14484 > URL: https://issues.apache.org/jira/browse/HIVE-14484 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 3.2.0 > > > Follow-up of HIVE-14249. -- This message was sent by Atlassian JIRA (v7.6.3#76005)