[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847621#comment-17847621
 ] 

Hudson commented on HBASE-25972:


Results for branch branch-2.6
[build #120 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/120/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/120/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/120/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/120/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/120/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847613#comment-17847613
 ] 

Hudson commented on HBASE-25972:


Results for branch branch-2.5
[build #530 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/530/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/530/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/530//console].


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/530/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/530//console].


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847599#comment-17847599
 ] 

Hudson commented on HBASE-25972:


Results for branch branch-3
[build #207 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/207/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/207/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/207/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/207//console].


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847492#comment-17847492
 ] 

Hudson commented on HBASE-25972:


Results for branch branch-2
[build #1057 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1057/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1057/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1057/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1057/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1057/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-17 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847456#comment-17847456
 ] 

Hudson commented on HBASE-25972:


Results for branch master
[build #1073 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1073/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1073/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1073/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1073/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-17 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847389#comment-17847389
 ] 

Andrew Kyle Purtell commented on HBASE-25972:
-

And fwiw we have plans to move up to 2.6 soon and also stay there for a while. 

> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-17 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847388#comment-17847388
 ] 

Andrew Kyle Purtell commented on HBASE-25972:
-

Thanks [~bbeaudreault]. I'm only waiting on another precommit to confirm a 
spotless fix and then will merge this back all the way through 2.6 into 2.5. 

> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-17 Thread Bryan Beaudreault (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847374#comment-17847374
 ] 

Bryan Beaudreault commented on HBASE-25972:
---

As a general philosophy, I'd love to move towards bug fixes only in patch 
versions. So that we can push out more minor/major releases. But I feel like 
we're a ways away from having the bandwidth for that or knowing what that means 
for supporting older versions, etc.

Selfishly, I'd love to get this feature in 2.6.x because I plan to stay on this 
release line at my company for a while and we have an interest in that.

So I realize this doesn't sound very internally consistent, but since there are 
no compatibility issues I think it'd be nice to get into branch-2.6.

> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2024-05-17 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847372#comment-17847372
 ] 

Andrew Kyle Purtell commented on HBASE-25972:
-

Summary of compatibility related concerns:
 * New HFile metadata key {{HStoreFile.HISTORICAL_KEY}}. Correctly handled by 
new code if absent. Queries with old code that don't recognize it will perform 
normally. Refer to design document.
 * In {{CompactionConfiguration}}, only if the new configuration setting for 
historical file structure is enabled then {{minFilesToCompact}} will be 
incremented by 1.  

All other changes are to Private interfaces. 
I don't think there is anything here that would prevent a backport to releasing 
lines except philosophical choice on introduction of new features in patch 
versions. Per the design document, after upgrade the historical file 
structuring is not enabled by default. If it is enabled, and later the cluster 
is rolled back, the contents of the HFiles are all still there and readable by 
the older code, only the potential read time optimization is obviously not 
performed by the older code. YDYT [~bbeaudreault] ? 

> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25972) Dual File Compaction

2023-12-21 Thread Kadir Ozdemir (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799356#comment-17799356
 ] 

Kadir Ozdemir commented on HBASE-25972:
---

I added the sequentialDelete and randomDelete tests to PerformanceEvaluation 
and did some performance testing on a local HBase as follows.
 # Enabled the dual file compaction. 
{code:java}

    hbase.hstore.defaultengine.enable.dualfilewriter
    true
 {code}

 # Created a table with one column family and inserted 100 rows with 20 
columns each with 32 byte value.
{code:java}
bin/hbase pe --nomapred --rows=100 --table=T1 --columns=20 --valueSize=32 
sequentialWrite 1 {code}

 # Set KeepDeleteCells to true. This is to keep delete cells after major 
compaction. This simulates minor compaction where deleted cells are not removed.
{code:java}
alter "T1", {NAME =>"info0", KEEP_DELETED_CELLS => TRUE} {code}

 # Inserted 50 delete family markers randomly which deleted around 32% of 
the inserted rows.
{code:java}
bin/hbase pe --nomapred --rows=50 --table=T1 randomDelete 1 {code}

 # Flushed and major compacted T1.
 # Scaned 1 rows. 
{code:java}
bin/hbase pe --nomapred --rows=100 --table=T1 scan 1  {code}

The above scan took 7040ms.

Stopped the local HBase, disabled the dual file compaction, restarted the local 
HBase,  run major compaction, and repeated the scan test. This time the scan 
took 8539ms.

Then deleted all the rows using 
{code:java}
bin/hbase pe --nomapred --rows=100 --table=T1 sequentialDelete 1 {code}
Scanned the table within the HBase shell.  It took 6.9936 seconds.

Stopped the local HBase, enabled the dual compaction, restarted the local 
HBase, and run major compaction. Then scanned the table within the HBase shell 
again. This time it took 0.5660 seconds.

The above tests confirms the expected performance gain from the dual file 
compaction.

> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)