Bryan Beaudreault created HBASE-28216: -----------------------------------------
Summary: HDFS erasure coding support for table data dirs Key: HBASE-28216 URL: https://issues.apache.org/jira/browse/HBASE-28216 Project: HBase Issue Type: New Feature Reporter: Bryan Beaudreault [Erasure coding|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html] (EC) is a hadoop-3 feature which can drastically reduce storage requirements, at the expense of locality. At my company we have a few hbase clusters which are extremely data dense and take mostly write traffic, fewer reads (cold data). We'd like to reduce the cost of these clusters, and EC is a great way to do that since it can reduce replication related storage costs by 50%. It's possible to enable EC policies on sub directories of HDFS. One can manually set this with {{{}hdfs ec -setPolicy -path /hbase/data/default/usertable -policy xxxx{}}}. This can work without any hbase support. One problem with that is a lack of visibility by operators into which tables might have EC enabled. I think this is where HBase can help. Here's my proposal: * Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY * In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set, verify that the requested policy is available and enabled via DistributedFileSystem. getErasureCodingPolicies(). * During ModifyTableProcedure, add a new state for MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY. ** When adding or changing a policy, use DistributedFileSystem. setErasureCodingPolicy to sync it for the data and archive dir of that table (or column in table) ** When removing the property or setting it to empty, use DistributedFileSystem. unsetErasureCodingPolicy to remove it from the data and archive dir. Since this new API is in hadoop-3 only, we'll need to add a reflection wrapper class for managing the calls and verifying that the API is available. We'll similarly do that API check in preflightChecks. -- This message was sent by Atlassian Jira (v8.20.10#820010)