Bryan Beaudreault created HBASE-28216:
-----------------------------------------

             Summary: HDFS erasure coding support for table data dirs
                 Key: HBASE-28216
                 URL: https://issues.apache.org/jira/browse/HBASE-28216
             Project: HBase
          Issue Type: New Feature
            Reporter: Bryan Beaudreault


[Erasure 
coding|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html]
 (EC) is a hadoop-3 feature which can drastically reduce storage requirements, 
at the expense of locality. At my company we have a few hbase clusters which 
are extremely data dense and take mostly write traffic, fewer reads (cold 
data). We'd like to reduce the cost of these clusters, and EC is a great way to 
do that since it can reduce replication related storage costs by 50%.

It's possible to enable EC policies on sub directories of HDFS. One can 
manually set this with {{{}hdfs ec -setPolicy -path 
/hbase/data/default/usertable -policy xxxx{}}}. This can work without any hbase 
support.

One problem with that is a lack of visibility by operators into which tables 
might have EC enabled. I think this is where HBase can help. Here's my proposal:
 * Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY
 * In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set, 
verify that the requested policy is available and enabled via 
DistributedFileSystem.
getErasureCodingPolicies().
 * During ModifyTableProcedure, add a new state for 
MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY.
 ** When adding or changing a policy, use DistributedFileSystem.
setErasureCodingPolicy to sync it for the data and archive dir of that table 
(or column in table)
 ** When removing the property or setting it to empty, use 
DistributedFileSystem.
unsetErasureCodingPolicy to remove it from the data and archive dir.

Since this new API is in hadoop-3 only, we'll need to add a reflection wrapper 
class for managing the calls and verifying that the API is available. We'll 
similarly do that API check in preflightChecks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to