[jira] [Commented] (HIVE-21917) COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-17 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976163#comment-16976163
 ] 

Craig Condit commented on HIVE-21917:
-

[~dkuzmenko], checkInterval appears to be used as both a local variable and a 
class variable. Is this intended?

> COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
> 
>
> Key: HIVE-21917
> URL: https://issues.apache.org/jira/browse/HIVE-21917
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Craig Condit
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-21917.1.patch, HIVE-21917.2.patch, 
> HIVE-21917.3.patch
>
>
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> On this test cluster, it takes approximately 10 minutes to loop through all 
> the entries and results in severe performance degradation on metastore 
> operations. With the default run timing of 5 minutes, the initiator basically 
> never stops running.
> On a production cluster with 2M partitions, this would be a non-starter.
> The initiator thread should proactively remove entries from 
> COMPLETED_TXN_COMPONENTS when it determines that a compaction is not needed, 
> so that they are not evaluated again on the next loop.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21917) COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-06-24 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit updated HIVE-21917:

Summary: COMPLETED_TXN_COMPONENTS table is never cleaned up unless 
Compactor runs  (was: COMPLETED_TXN_COMPONENTS table is never cleaned up unless 
Compator runs)

> COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs
> 
>
> Key: HIVE-21917
> URL: https://issues.apache.org/jira/browse/HIVE-21917
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Craig Condit
>Priority: Major
>
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> On this test cluster, it takes approximately 10 minutes to loop through all 
> the entries and results in severe performance degradation on metastore 
> operations. With the default run timing of 5 minutes, the initiator basically 
> never stops running.
> On a production cluster with 2M partitions, this would be a non-starter.
> The initiator thread should proactively remove entries from 
> COMPLETED_TXN_COMPONENTS when it determines that a compaction is not needed, 
> so that they are not evaluated again on the next loop.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21631) Enhance metastore API to allow bulk-loading materialized views

2019-04-18 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reassigned HIVE-21631:
---

Assignee: Jesus Camacho Rodriguez

> Enhance metastore API to allow bulk-loading materialized views
> --
>
> Key: HIVE-21631
> URL: https://issues.apache.org/jira/browse/HIVE-21631
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views, Metastore
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Craig Condit
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> Currently, every query in HS2 results in a metastore call per database to 
> retrieve all materialized views. This causes severe performance degradation 
> on multi-tenant clusters with thousands of databases (very similar to how the 
> old get_function() metastore call didn't scale).
> We should add a metastore call which can retrieve all materialized view 
> definitions at once (for all DBs) so that we don't have to make thousands of 
> metastore calls per query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21631) Enhance metastore API to allow bulk-loading materialized views

2019-04-18 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit updated HIVE-21631:

Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-14484

> Enhance metastore API to allow bulk-loading materialized views
> --
>
> Key: HIVE-21631
> URL: https://issues.apache.org/jira/browse/HIVE-21631
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views, Metastore
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Craig Condit
>Priority: Major
>
> Currently, every query in HS2 results in a metastore call per database to 
> retrieve all materialized views. This causes severe performance degradation 
> on multi-tenant clusters with thousands of databases (very similar to how the 
> old get_function() metastore call didn't scale).
> We should add a metastore call which can retrieve all materialized view 
> definitions at once (for all DBs) so that we don't have to make thousands of 
> metastore calls per query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HIVE-14484) Extensions for initial materialized views implementation

2019-04-18 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reopened HIVE-14484:
-

> Extensions for initial materialized views implementation
> 
>
> Key: HIVE-14484
> URL: https://issues.apache.org/jira/browse/HIVE-14484
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 3.2.0
>
>
> Follow-up of HIVE-14249.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)