[ 
https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654209#comment-17654209
 ] 

Jing Zhao commented on HDFS-16875:
----------------------------------

Posted the design doc for the EC access proxy.

> Erasure Coding: data access proxy to allow old clients to read EC data
> ----------------------------------------------------------------------
>
>                 Key: HDFS-16875
>                 URL: https://issues.apache.org/jira/browse/HDFS-16875
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ec, erasure-coding
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>            Priority: Major
>         Attachments: Erasure Coding Access Proxy.pdf
>
>
> Erasure Coding is only supported by Hadoop 3, while many production 
> deployments still depend on Hadoop 2. Upgrading the whole data tech stack to 
> the Hadoop 3 release may involve big migration efforts and even reliability 
> risks, considering the incompatibilities between these two Hadoop major 
> releases as well as the potential uncovered issues and risks hidden in newer 
> releases. Therefore, we need to find a solution, with the least amount of 
> migration effort and risk, to adopt Erasure Coding for cost efficiency but 
> still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in 
> a transparent manner.
> Internally we have developed an EC access proxy which translates the EC data 
> for old clients. We also extend the NameNode RPC so it can recognize HDFS 
> clients with/without the EC support, and redirect the old clients to the 
> proxy. With the proxy we set up separate Erasure Coding clusters storing 
> hundreds of PB of data, while leaving other production clusters and all the 
> upper layer applications untouched.
> Considering some changes are made at fundamental components of HDFS (e.g., 
> client-NN RPC header), we do not aim to merge the change to trunk. We will 
> use this ticket to share the design and implementation details (including the 
> code) and collect feedback. We may use a separate github repo to open source 
> the implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to