[jira] [Commented] (HDDS-3816) Erasure Coding in Apache Hadoop Ozone

Yiqun Lin (Jira) Wed, 24 Jun 2020 07:22:03 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143863#comment-17143863
 ]


Yiqun Lin commented on HDDS-3816:
---------------------------------

Hi [~umamaheswararao], the design doc looks great. I go through all the design 
today and some comments from me.

The design doc introduces Container level, Block Level EC implementation and 
corresponding advantages/disadvantages. But it doesn't mentioned which is the 
final choice? Or that means we want to implement both of them and user can 
chose which way they prefer?

For the Container level option, it will be more easier to implement than Block 
Level option. But as design doc also mentioned that, it has more impact of this 
option, for example, delete operation impact(additionally need to implement 
small container merge), data recovery cost and high risk of data loss when some 
node crashed. From my personal opinion, Block level option is a more complete 
and robust implementation. How do we think of this?

For the read/write performance comparison, the Block level EC will have a 
better performance. The block is split into multiple nodes as a striped 
storage. We can parallel read/write the data based on this change. In Container 
level, the block data structure in one Container actually unchanged, it still 
keeps continuous way but just has a striped form in Container level. So the 
read/write rate is exactly not changed under Container level EC. We still need 
to find one specific Container node to read/write for specific block data.
  
 What's the implementation complexity of this two options way? Like can we 
perfectly integrated current HDFS EC algorithm implementation into Ozone? In 
order to support EC, if there will be a large code refactor in current 
read/write implementation? 

 

I see current EC design depends on the abstraction of storage-class 
implementation. I'm not sure if this is an easy thing to do at the beginning of 
Ozone EC implementation. Storage-class implementation is also a large feature I 
think, we define data storage type, policy and multiple rules to let system do 
the data transform automatically and transparently. This is similar to HDFS 
SSM(smart storage management) feature design in HDFS-7343. I'm not means to 
disagree of storage-class, but have a concern if we let this as one thing we 
must to implement first.

Please correct me if I am wrong, thanks.

> Erasure Coding in Apache Hadoop Ozone
> -------------------------------------
>
>                 Key: HDDS-3816
>                 URL: https://issues.apache.org/jira/browse/HDDS-3816
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>          Components: SCM
>            Reporter: Uma Maheswara Rao G
>            Priority: Major
>         Attachments: Erasure Coding in Apache Hadoop Ozone.pdf
>
>
> We propose to implement Erasure Coding in Apache Hadoop Ozone to provide 
> efficient storage. With EC in place, Ozone can provide same or better 
> tolerance by giving 50% or more  storage space savings. 
> In HDFS project, we already have native codecs(ISAL) and Java codecs 
> implemented, we can leverage the same or similar codec design.
> However, the critical part of EC data layout design is in-progress, we will 
> post the design doc soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-3816) Erasure Coding in Apache Hadoop Ozone

Reply via email to