First version of the Erasure Coding design doc is published and ready to comment:

https://issues.apache.org/jira/browse/HDDS-3816


As it's a long document I will try to summarize it:

* EC will be automatic and turned on by default. Cold data will be encoded by default in the background (async).

* EC can be set on storage-class level (which means that keys and buckets can be assigned to different policies)

There are two main implementation options:


 1. Container level EC

* Easier to implement, less data movement, original containers are not modified, data locality is supported

* But the implementation of delete and on-line recovery is tricky and less efficient



 2. Block level, striped EC (similar to HDFS)

 * Delete / online-recovery are easier

 * Needs to rewrite all the data, and update OM, no data locality



Next steps:

 * The proposal will be presented to explain the options

* We can also improve it to collect more aspects for evaluation of the options

 * We need to agree in the long-term vision:

1. If we would like to support direct EC write (long-term), second option can be better, but it means a totally new pipeline / write method.

2. With the current model (Ratis for write --> EC later) first option can be easier.

3. Storage class abstraction can help to define an API which can support both (we can implement them in different phases)



Big thanks to Uma and Stephen for writing parts of the design doc, and Prashant, Jitendra, Arpit for the early review / questions / comments (sorry, If I missed somebody).

Marton

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-dev-h...@hadoop.apache.org

Reply via email to