[DESIGN] Erasure Coding

Elek, Marton Tue, 23 Jun 2020 00:58:03 -0700

First version of the Erasure Coding design doc is published and ready tocomment:


https://issues.apache.org/jira/browse/HDDS-3816



As it's a long document I will try to summarize it:

* EC will be automatic and turned on by default. Cold data will beencoded by default in the background (async).

* EC can be set on storage-class level (which means that keys andbuckets can be assigned to different policies)


There are two main implementation options:


 1. Container level EC

* Easier to implement, less data movement, original containers arenot modified, data locality is supported

* But the implementation of delete and on-line recovery is tricky andless efficient




 2. Block level, striped EC (similar to HDFS)

 * Delete / online-recovery are easier

 * Needs to rewrite all the data, and update OM, no data locality



Next steps:

 * The proposal will be presented to explain the options

* We can also improve it to collect more aspects for evaluation of theoptions


 * We need to agree in the long-term vision:

1. If we would like to support direct EC write (long-term), secondoption can be better, but it means a totally new pipeline / write method.

2. With the current model (Ratis for write --> EC later) firstoption can be easier.

3. Storage class abstraction can help to define an API which cansupport both (we can implement them in different phases)

Big thanks to Uma and Stephen for writing parts of the design doc, andPrashant, Jitendra, Arpit for the early review / questions / comments(sorry, If I missed somebody).


Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-dev-h...@hadoop.apache.org

[DESIGN] Erasure Coding

Reply via email to