[jira] [Updated] (HBASE-25869) WAL value compression

2021-05-21 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25869:

Hadoop Flags: Reviewed
Release Note: 
WAL storage can be expensive, especially if the cell values represented in the 
edits are large, consisting of blobs or significant lengths of text. Such WALs 
might need to be kept around for a fairly long time to satisfy replication 
constraints on a space limited (or space-contended) filesystem.

Enable WAL compression and, with this feature, WAL value compression, to save 
space in exchange for slightly higher WAL append latencies. The degree of 
performance impact will depend on the compression algorithm selection.  SNAPPY 
or ZSTD are recommended algorithms, if native codec support is available. 
SNAPPY may even provide an overall improvement in WAL commit latency, so is the 
best choice. GZ can be a reasonable fallback where native codec support is not 
available.

To enable WAL compression, value compression, and select the desired algorithm, 
edit your site configuration like so:



hbase.regionserver.wal.enablecompression
true




hbase.regionserver.wal.value.enablecompression
true



> Key: HBASE-25869
> URL: https://issues.apache.org/jira/browse/HBASE-25869
> Project: HBase
>  Issue Type: Bug
>  Components: Operability, wal
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> WAL storage can be expensive, especially if the cell values represented in 
> the edits are large, consisting of blobs or significant lengths of text. Such 
> WALs might need to be kept around for a fairly long time to satisfy 
> replication constraints on a space limited (or space-contended) filesystem.
> We have a custom dictionary compression scheme for cell metadata that is 
> engaged when WAL compression is enabled in site configuration. This is fine 
> for that application, where we can expect the universe of values and their 
> lengths in the custom dictionaries to be constrained. For arbitrary cell 
> values it is better to use one of the available compression codecs, which are 
> suitable for arbitrary albeit compressible data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25869) WAL value compression

2021-05-12 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25869:

Description: 
WAL storage can be expensive, especially if the cell values represented in the 
edits are large, consisting of blobs or significant lengths of text. Such WALs 
might need to be kept around for a fairly long time to satisfy replication 
constraints on a space limited (or space-contended) filesystem.

We have a custom dictionary compression scheme for cell metadata that is 
engaged when WAL compression is enabled in site configuration. This is fine for 
that application, where we can expect the universe of values and their lengths 
in the custom dictionaries to be constrained. For arbitrary cell values it is 
better to use one of the available compression codecs, which are suitable for 
arbitrary albeit compressible data.

  was:
WAL storage can be expensive, especially if the cell values represented in the 
edits are large, consisting of blobs or significant lengths of text. Such WALs 
might need to be kept around for a fairly long time to satisfy replication 
constraints on a space limited (or space -contended) filesystem. 

We have a custom dictionary compression scheme for cell metadata that is 
engaged when WAL compression is enabled in site configuration. This is fine for 
that application, where we can expect the universe of values (and their 
lengths) in the custom dictionaries to be constrained. For arbitrary values it 
is better to use Deflate compression, which is a complete LZ-class algorithm 
suitable for arbitrary albeit compressible data, is reasonably fast, certainly 
fast enough for WALs, compresses well, and is universally available as part of 
the Java runtime. 

With a trick that encodes whether or not the cell value is compressed in the 
high order bit of the type byte, this can be done in a backwards compatible 
manner. 


> WAL value compression
> -
>
> Key: HBASE-25869
> URL: https://issues.apache.org/jira/browse/HBASE-25869
> Project: HBase
>  Issue Type: Bug
>  Components: Operability, wal
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> WAL storage can be expensive, especially if the cell values represented in 
> the edits are large, consisting of blobs or significant lengths of text. Such 
> WALs might need to be kept around for a fairly long time to satisfy 
> replication constraints on a space limited (or space-contended) filesystem.
> We have a custom dictionary compression scheme for cell metadata that is 
> engaged when WAL compression is enabled in site configuration. This is fine 
> for that application, where we can expect the universe of values and their 
> lengths in the custom dictionaries to be constrained. For arbitrary cell 
> values it is better to use one of the available compression codecs, which are 
> suitable for arbitrary albeit compressible data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25869) WAL value compression

2021-05-07 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25869:

Status: Patch Available  (was: Open)

> WAL value compression
> -
>
> Key: HBASE-25869
> URL: https://issues.apache.org/jira/browse/HBASE-25869
> Project: HBase
>  Issue Type: Bug
>  Components: Operability, wal
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> WAL storage can be expensive, especially if the cell values represented in 
> the edits are large, consisting of blobs or significant lengths of text. Such 
> WALs might need to be kept around for a fairly long time to satisfy 
> replication constraints on a space limited (or space -contended) filesystem. 
> We have a custom dictionary compression scheme for cell metadata that is 
> engaged when WAL compression is enabled in site configuration. This is fine 
> for that application, where we can expect the universe of values (and their 
> lengths) in the custom dictionaries to be constrained. For arbitrary values 
> it is better to use Deflate compression, which is a complete LZ-class 
> algorithm suitable for arbitrary albeit compressible data, is reasonably 
> fast, certainly fast enough for WALs, compresses well, and is universally 
> available as part of the Java runtime. 
> With a trick that encodes whether or not the cell value is compressed in the 
> high order bit of the type byte, this can be done in a backwards compatible 
> manner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25869) WAL value compression

2021-05-07 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25869:

Description: 
WAL storage can be expensive, especially if the cell values represented in the 
edits are large, consisting of blobs or significant lengths of text. Such WALs 
might need to be kept around for a fairly long time to satisfy replication 
constraints on a space limited (or space -contended) filesystem. 

We have a custom dictionary compression scheme for cell metadata that is 
engaged when WAL compression is enabled in site configuration. This is fine for 
that application, where we can expect the universe of values (and their 
lengths) in the custom dictionaries to be constrained. For arbitrary values it 
is better to use Deflate compression, which is a complete LZ-class algorithm 
suitable for arbitrary albeit compressible data, is reasonably fast, certainly 
fast enough for WALs, compresses well, and is universally available as part of 
the Java runtime. 

With a trick that encodes whether or not the cell value is compressed in the 
high order bit of the type byte, this can be done in a backwards compatible 
manner. 

  was:
WAL storage can be expensive, especially if the cell values represented in the 
edits are large, consisting of blobs or significant lengths of text. Such WALs 
might need to be kept around for a fairly long time to satisfy replication 
constraints on a space limited (or space -contended) filesystem. 

We have a custom dictionary compression scheme for cell metadata that is 
engaged when WAL compression is enabled in site configuration. This is fine for 
that application, where we can expect the universe of values (and their 
lengths) in the custom dictionaries to be constrained. For arbitrary values it 
is better to use Deflate compression, which is a complete LZ-class algorithm 
suitable for arbitrary albeit compressable data, is reasonably fast, certainly 
fast enough for WALs, compresses well, and is universally available as part of 
the Java runtime. 

With a trick that encodes whether or not the cell value is compressed in the 
high order bit of the type byte, this can be done in a backwards compatible 
manner. 


> WAL value compression
> -
>
> Key: HBASE-25869
> URL: https://issues.apache.org/jira/browse/HBASE-25869
> Project: HBase
>  Issue Type: Bug
>  Components: Operability, wal
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> WAL storage can be expensive, especially if the cell values represented in 
> the edits are large, consisting of blobs or significant lengths of text. Such 
> WALs might need to be kept around for a fairly long time to satisfy 
> replication constraints on a space limited (or space -contended) filesystem. 
> We have a custom dictionary compression scheme for cell metadata that is 
> engaged when WAL compression is enabled in site configuration. This is fine 
> for that application, where we can expect the universe of values (and their 
> lengths) in the custom dictionaries to be constrained. For arbitrary values 
> it is better to use Deflate compression, which is a complete LZ-class 
> algorithm suitable for arbitrary albeit compressible data, is reasonably 
> fast, certainly fast enough for WALs, compresses well, and is universally 
> available as part of the Java runtime. 
> With a trick that encodes whether or not the cell value is compressed in the 
> high order bit of the type byte, this can be done in a backwards compatible 
> manner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25869) WAL value compression

2021-05-07 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-25869:

Description: 
WAL storage can be expensive, especially if the cell values represented in the 
edits are large, consisting of blobs or significant lengths of text. Such WALs 
might need to be kept around for a fairly long time to satisfy replication 
constraints on a space limited (or space -contended) filesystem. 

We have a custom dictionary compression scheme for cell metadata that is 
engaged when WAL compression is enabled in site configuration. This is fine for 
that application, where we can expect the universe of values (and their 
lengths) in the custom dictionaries to be constrained. For arbitrary values it 
is better to use Deflate compression, which is a complete LZ-class algorithm 
suitable for arbitrary albeit compressable data, is reasonably fast, certainly 
fast enough for WALs, compresses well, and is universally available as part of 
the Java runtime. 

With a trick that encodes whether or not the cell value is compressed in the 
high order bit of the type byte, this can be done in a backwards compatible 
manner. 

  was:
WAL storage can be expensive, especially if the cell values represented in the 
edits are large, consisting of blobs or significant lengths of text. Such WALs 
might need to be kept around for a fairly long time to satisfy replication 
constraints on a space limited (or space -contended) filesystem. 

We have a custom dictionary compression scheme for cell metadata that is 
engaged when WAL compression is enabled in site configuration. This is fine for 
that application, where we can expect the universe of values (and their 
lengths) in the custom dictionaries to be constrained. For arbitrary values it 
is better to use Deflate compression, which is reasonably fast, certainly fast 
enough for WALs, compresses well, and is universally available as part of the 
Java runtime. 

With a trick that encodes whether or not the cell value is compressed in the 
high order bit of the type byte, this can be done in a backwards compatible 
manner. 


> WAL value compression
> -
>
> Key: HBASE-25869
> URL: https://issues.apache.org/jira/browse/HBASE-25869
> Project: HBase
>  Issue Type: Bug
>  Components: Operability, wal
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> WAL storage can be expensive, especially if the cell values represented in 
> the edits are large, consisting of blobs or significant lengths of text. Such 
> WALs might need to be kept around for a fairly long time to satisfy 
> replication constraints on a space limited (or space -contended) filesystem. 
> We have a custom dictionary compression scheme for cell metadata that is 
> engaged when WAL compression is enabled in site configuration. This is fine 
> for that application, where we can expect the universe of values (and their 
> lengths) in the custom dictionaries to be constrained. For arbitrary values 
> it is better to use Deflate compression, which is a complete LZ-class 
> algorithm suitable for arbitrary albeit compressable data, is reasonably 
> fast, certainly fast enough for WALs, compresses well, and is universally 
> available as part of the Java runtime. 
> With a trick that encodes whether or not the cell value is compressed in the 
> high order bit of the type byte, this can be done in a backwards compatible 
> manner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)