[
https://issues.apache.org/jira/browse/HDDS-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143863#comment-17143863
]
Yiqun Lin commented on HDDS-3816:
---------------------------------
Hi [~umamaheswararao], the design doc looks great. I go through all the design
today and some comments from me.
The design doc introduces Container level, Block Level EC implementation and
corresponding advantages/disadvantages. But it doesn't mentioned which is the
final choice? Or that means we want to implement both of them and user can
chose which way they prefer?
For the Container level option, it will be more easier to implement than Block
Level option. But as design doc also mentioned that, it has more impact of this
option, for example, delete operation impact(additionally need to implement
small container merge), data recovery cost and high risk of data loss when some
node crashed. From my personal opinion, Block level option is a more complete
and robust implementation. How do we think of this?
For the read/write performance comparison, the Block level EC will have a
better performance. The block is split into multiple nodes as a striped
storage. We can parallel read/write the data based on this change. In Container
level, the block data structure in one Container actually unchanged, it still
keeps continuous way but just has a striped form in Container level. So the
read/write rate is exactly not changed under Container level EC. We still need
to find one specific Container node to read/write for specific block data.
What's the implementation complexity of this two options way? Like can we
perfectly integrated current HDFS EC algorithm implementation into Ozone? In
order to support EC, if there will be a large code refactor in current
read/write implementation?
I see current EC design depends on the abstraction of storage-class
implementation. I'm not sure if this is an easy thing to do at the beginning of
Ozone EC implementation. Storage-class implementation is also a large feature I
think, we define data storage type, policy and multiple rules to let system do
the data transform automatically and transparently. This is similar to HDFS
SSM(smart storage management) feature design in HDFS-7343. I'm not means to
disagree of storage-class, but have a concern if we let this as one thing we
must to implement first.
Please correct me if I am wrong, thanks.
> Erasure Coding in Apache Hadoop Ozone
> -------------------------------------
>
> Key: HDDS-3816
> URL: https://issues.apache.org/jira/browse/HDDS-3816
> Project: Hadoop Distributed Data Store
> Issue Type: New Feature
> Components: SCM
> Reporter: Uma Maheswara Rao G
> Priority: Major
> Attachments: Erasure Coding in Apache Hadoop Ozone.pdf
>
>
> We propose to implement Erasure Coding in Apache Hadoop Ozone to provide
> efficient storage. With EC in place, Ozone can provide same or better
> tolerance by giving 50% or more storage space savings.
> In HDFS project, we already have native codecs(ISAL) and Java codecs
> implemented, we can leverage the same or similar codec design.
> However, the critical part of EC data layout design is in-progress, we will
> post the design doc soon.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]