Re: HADOOP-15558 review

2018-07-05 Thread Chaitanya M V S
Adding Shreya and other team mates to the thread

Regards,
Chaitanya


On Thu, Jul 5, 2018 at 3:30 PM Chaitanya M V S 
wrote:

> Hello,
>
> We are a group of interns at IISc, Bangalore and have tried to implement Clay
> Codes <https://www.usenix.org/conference/fast18/presentation/vajha> in
> Hadoop by using the Erasure Codec pluggable codec API (HDFS-7337).
>
> Clay codes are erasure codes which have many must have properties and are
> well-poised which makes them practical to distributed systems.
> They are known to reduce network bandwidth (the amount of data transmitted
> on a single node failure), decrease repair times and improve I/O
> performance.
> The Clay codes work by utilizing the underlying implementation of RS codes
> (in fact any MDS code) hence making them lucrative and easy to use/extend.
>
> We have put forward both a design doc and a patch at Hadoop-15558
> <https://issues.apache.org/jira/browse/HADOOP-15558>. We would be
> grateful if someone can review and suggest further improvements.
>
> *P.S. *Clay codes also have been implemented and are in review at ceph
> <https://github.com/ceph/ceph/pull/14300>.
>
>
> Regards,
> M.V.S.Chaitanya & Shreya Gupta
>


HADOOP-15558 review

2018-07-05 Thread Chaitanya M V S
Hello,

We are a group of interns at IISc, Bangalore and have tried to implement Clay
Codes  in
Hadoop by using the Erasure Codec pluggable codec API (HDFS-7337).

Clay codes are erasure codes which have many must have properties and are
well-poised which makes them practical to distributed systems.
They are known to reduce network bandwidth (the amount of data transmitted
on a single node failure), decrease repair times and improve I/O
performance.
The Clay codes work by utilizing the underlying implementation of RS codes
(in fact any MDS code) hence making them lucrative and easy to use/extend.

We have put forward both a design doc and a patch at Hadoop-15558
. We would be grateful
if someone can review and suggest further improvements.

*P.S. *Clay codes also have been implemented and are in review at ceph
.


Regards,
M.V.S.Chaitanya & Shreya Gupta


Regarding Hadoop Erasure Coding architecture

2018-06-14 Thread Chaitanya M V S
Hi!

We a group of people trying to understand the architecture of erasure
coding in Hadoop 3.0. We have been facing difficulties to understand few
terms and concepts regarding the same.

1. What do the terms Block, Block Group, Stripe, Cell and Chunk mean in the
context of erasure coding (these terms have taken different meanings and
have been used interchangeably over various documentation and blogs)? How
has this been incorporated in reading and writing of EC data?

2. How has been the idea/concept of the block from previous versions
carried over to EC?

3. ‎The higher level APIs, that of ErasureCoders and ErasureCodec still
hasn't been plugged into Hadoop. Also, I haven't found any new Jira
regarding the same. Can I know if there are any updates or pointers
regarding the incorporation of these APIs into Hadoop?

4. How is the datanode for reconstruction work chosen?  Also, how are the
buffer sizes for the reconstruction work determined?


Thanks in advance for your time and considerations.

Regards,
M.V.S.Chaitanya