First version of the Erasure Coding design doc is published and ready to
comment:
https://issues.apache.org/jira/browse/HDDS-3816
As it's a long document I will try to summarize it:
* EC will be automatic and turned on by default. Cold data will be
encoded by default in the background (async).
* EC can be set on storage-class level (which means that keys and
buckets can be assigned to different policies)
There are two main implementation options:
1. Container level EC
* Easier to implement, less data movement, original containers are
not modified, data locality is supported
* But the implementation of delete and on-line recovery is tricky and
less efficient
2. Block level, striped EC (similar to HDFS)
* Delete / online-recovery are easier
* Needs to rewrite all the data, and update OM, no data locality
Next steps:
* The proposal will be presented to explain the options
* We can also improve it to collect more aspects for evaluation of the
options
* We need to agree in the long-term vision:
1. If we would like to support direct EC write (long-term), second
option can be better, but it means a totally new pipeline / write method.
2. With the current model (Ratis for write --> EC later) first
option can be easier.
3. Storage class abstraction can help to define an API which can
support both (we can implement them in different phases)
Big thanks to Uma and Stephen for writing parts of the design doc, and
Prashant, Jitendra, Arpit for the early review / questions / comments
(sorry, If I missed somebody).
Marton
---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-dev-h...@hadoop.apache.org