Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-22 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218762
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
Lines 83 (patched)


Please use the constants here.


- Peter Vary


On nov. 21, 2019, 5:35 du, Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> ---
> 
> (Updated nov. 21, 2019, 5:35 du)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  268038795b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/2/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-22 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218754
---


Ship it!




Lgtm +1

- Laszlo Pinter


On Nov. 21, 2019, 5:35 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> ---
> 
> (Updated Nov. 21, 2019, 5:35 p.m.)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  268038795b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/2/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-21 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/
---

(Updated Nov. 21, 2019, 5:35 p.m.)


Review request for hive, Laszlo Pinter and Peter Vary.


Bugs: HIVE-21917
https://issues.apache.org/jira/browse/HIVE-21917


Repository: hive-git


Description
---

The Initiator thread in the metastore repeatedly loops over entries in the 
COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
need to be compacted. However, entries are never removed from this table except 
by a completed Compactor run.

In a cluster where most tables / partitions are write-once read-many, this 
results in stale entries in this table never being cleaned up. In a small test 
cluster, we have observed approximately 45k entries in this table (virtually 
equal to the number of partitions in the cluster) while < 100 of these tables 
have delta files at all. Since most of the tables will never get enough writes 
to trigger a compaction (and in fact have only ever been written to once), the 
initiator thread keeps trying to evaluate them on every loop.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 610cf05204 
  
ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java 
b28b57779b 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 8253ccb9c9 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 268038795b 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 e840758c9d 


Diff: https://reviews.apache.org/r/71792/diff/2/

Changes: https://reviews.apache.org/r/71792/diff/1-2/


Testing
---

Unit tests


Thanks,

Denys Kuzmenko



Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-21 Thread Denys Kuzmenko via Review Board


> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote:
> > Not ready. Need to handle aborted and currently active compactions.
> 
> Denys Kuzmenko wrote:
> Handling above cases would complicate the Initiator logic and make 
> preliminare check longer. Not sure how critial it is that in case of 
> unsuccessful compaction attempt, on next run we won't retry unless there is 
> some change to the selected table/partiotion. Any thoughts on this?

Changed findPotentialCompactions query to: 

select distinct ctc_database, ctc_table, ctc_partition from 
COMPLETED_TXN_COMPONENTS where 
(select CC_STATE from COMPLETED_COMPACTIONS where ctc_database = CC_DATABASE 
and ctc_table = CC_TABLE and (ctc_partition is null or ctc_partition = 
cc_partition)
order by cc_id desc limit 1) IN ('a', 'f') || ctc_timestamp < current_timestamp

however this still won't cover skipped compactions due to already running one


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218723
---


On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> ---
> 
> (Updated Nov. 20, 2019, 12:20 p.m.)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  6281208247 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-21 Thread Denys Kuzmenko via Review Board


> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote:
> > Not ready. Need to handle aborted and currently active compactions.

Handling above cases would complicate the Initiator logic and make preliminare 
check longer. Not sure how critial it is that in case of unsuccessful 
compaction attempt, on next run we won't retry unless there is some change to 
the selected table/partiotion. Any thoughts on this?


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218723
---


On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> ---
> 
> (Updated Nov. 20, 2019, 12:20 p.m.)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  6281208247 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-20 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218723
---



Not ready. Need to handle aborted and currently active compactions.

- Denys Kuzmenko


On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> ---
> 
> (Updated Nov. 20, 2019, 12:20 p.m.)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  6281208247 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-20 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/
---

Review request for hive, Laszlo Pinter and Peter Vary.


Bugs: HIVE-21917
https://issues.apache.org/jira/browse/HIVE-21917


Repository: hive-git


Description
---

The Initiator thread in the metastore repeatedly loops over entries in the 
COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
need to be compacted. However, entries are never removed from this table except 
by a completed Compactor run.

In a cluster where most tables / partitions are write-once read-many, this 
results in stale entries in this table never being cleaned up. In a small test 
cluster, we have observed approximately 45k entries in this table (virtually 
equal to the number of partitions in the cluster) while < 100 of these tables 
have delta files at all. Since most of the tables will never get enough writes 
to trigger a compaction (and in fact have only ever been written to once), the 
initiator thread keeps trying to evaluate them on every loop.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 610cf05204 
  
ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java 
b28b57779b 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 8253ccb9c9 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 6281208247 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 e840758c9d 


Diff: https://reviews.apache.org/r/71792/diff/1/


Testing
---

Unit tests


Thanks,

Denys Kuzmenko