[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2014-11-24 Thread Vincent.Wei (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224107#comment-14224107
 ] 

Vincent.Wei commented on HDFS-503:
--

Is anybody know how to build this patch on Hadoop v2.2.0 ?

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-11-23 Thread Hemanth Makkapati (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156434#comment-13156434
 ] 

Hemanth Makkapati commented on HDFS-503:


Hey,
I am a beginner with hadoop and started delving into the code only lately.
As I was trying to get RAID up and running, I observed the following exception 
in the log

ERROR org.apache.hadoop.raid.RaidNode: java.lang.NullPointerException
at org.apache.hadoop.raid.RaidNode.tmpHarPathForCode(RaidNode.java:1491)
at org.apache.hadoop.raid.RaidNode.doHar(RaidNode.java:1217)
at org.apache.hadoop.raid.RaidNode.access$300(RaidNode.java:73)
at org.apache.hadoop.raid.RaidNode$HarMonitor.run(RaidNode.java:1371)
at java.lang.Thread.run(Thread.java:636)

The reason for this seems to be the absence of 'erasurecode' tag in raid 
configuration file which, in my case, is very similar to the sample 
configuration file provided. Once the tag is introduced, which is allowed to 
assume either XOR or RS, I didn't see any exception. Also, the README file also 
doesn't mention anything about such a tag. 
Please confirm if my observation is correct.
Thought of posting it here for the benefit of others.
BTW, I checked out code from the trunk.

Thank you.



 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-07-06 Thread sri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060343#comment-13060343
 ] 

sri commented on HDFS-503:
--

I have couple of questions, 

1)With, Raid being setup, I am not able to generate DFSAdmin report (hadoop 
dfsadmin -report). Why is that so ?

2)I am not able to reduce the targetReplicationFactor to 0 (I want to run 
mapreduce where the Bloackfixer retrives the data from the raided disks) Is der 
any way to do this.

Thanks in advance

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-07-06 Thread sri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060450#comment-13060450
 ] 

sri commented on HDFS-503:
--

I would like to know, if the stripes just act as a recovery option(when other 
datanodes have failed), or can they act as input to the mapreduce jobs(to 
satisfy locality). 


 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-07-06 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060975#comment-13060975
 ] 

dhruba borthakur commented on HDFS-503:
---

1. Raid has no impact on dfsadmin -report command.

2. You won't be able to set a replication factor to 0. You would have to 
manually pull the plug (kill it) on a datanode to see how raid works.

3. stripe locations do not contribute to split locations of a block, thus they 
are not used for map-reduce locality.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-06-12 Thread sri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048444#comment-13048444
 ] 

sri commented on HDFS-503:
--

Error in name node 
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.apache.hadoop.dfs.DistributedRaidFileSystem
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1304)
at org.apache.hadoop.fs.FileSystem.access$100(FileSystem.java:65)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1328)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109)
at org.apache.hadoop.fs.Trash.init(Trash.java:62)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.startTrashEmptier(NameNode.java:292)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:288)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:434)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.dfs.DistributedRaidFileSystem
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:864)
... 11 more

Can some body help me out..


 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-06-12 Thread Robert Chansler (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048446#comment-13048446
 ] 

Robert Chansler commented on HDFS-503:
--

I'll look forward to reading your message when I return to the office Friday 17 
June.


 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-06-02 Thread Krishnaraj (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042930#comment-13042930
 ] 

Krishnaraj commented on HDFS-503:
-

Is there any stable version of Hadoop erasure coding. Where can I download the 
source code of it. I am not able to find it, in the hadoop trunk.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-06-02 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043211#comment-13043211
 ] 

dhruba borthakur commented on HDFS-503:
---

source code in 
http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/src/contrib/raid

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-06-02 Thread Krishnaraj (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043213#comment-13043213
 ] 

Krishnaraj commented on HDFS-503:
-

I got this patch and I took the HDFS from 
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/ and put it in contrib and 
built it. But I din know how to use it further(ie instead of the hadoop jar 
that we setup in the cluster. I did not get the jar as mentioned in README). Is 
there any detailed tutorial?


 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-10-04 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917481#action_12917481
 ] 

Ramkumar Vadali commented on HDFS-503:
--

@shravankumar, to get a basic idea of HDFS RAID, you can read up Dhruba's blog 
post 
http://hadoopblog.blogspot.com/2009/08/hdfs-and-erasure-codes-hdfs-raid.html

If you need this for demo purposes, could you use the current hadoop trunk? I 
am not sure about the exact date of the next release. 
To use RAID, you need to create a configuration file and start the RAID daemon. 
You can look for examples in the unit tests, say TestRaidNode.


For further communication, you can contact me directly.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-10-03 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917286#action_12917286
 ] 

shravankumar commented on HDFS-503:
---

@Ramkumar vadali.Thank you sir
Where can i access the raid code(which have been fixed).


 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-09-30 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916677#action_12916677
 ] 

Ramkumar Vadali commented on HDFS-503:
--

@shravankumar Quite a few bugs in raid have been fixed in trunk. This will be 
part of the upcoming release hadoop-0.22. What do you mean by raid API?

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-09-24 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914757#action_12914757
 ] 

shravankumar commented on HDFS-503:
---

Can any one help me. In which stable version of hadoop this raid become a part 
and how can i access the API documents related to raid

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-07-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884405#action_12884405
 ] 

Celina d´ Ávila Samogin commented on HDFS-503:
--

I have to intend to propose something  about implementation of erasure coding 
techniques in HDFS, starting in July, 2010. I will add a comment for to say 
what I'm doing or ask for hints, soon as possible. For now, I have studied the 
texts suggested in this issue and others papers. I have read about RS codes and 
LDPC codes. I have not even started to implement and did not even start the 
test.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-06-07 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876178#action_12876178
 ] 

shravankumar commented on HDFS-503:
---

Thank you sir.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-06-07 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876187#action_12876187
 ] 

shravankumar commented on HDFS-503:
---

Hello sir,

1. what is the meaning for this
srcPath prefix=hdfs://dfs1.xxx.com:8000/user/dhruba/

2. In ADMINISTRATION they mentioned RaidNode Software what it means.

3. In HADOOP_HOME, run ant package to build Hadoop and its contrib packages.
This will come when we installed hadoop 0.20.1 or we need download ant package.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-06-02 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874514#action_12874514
 ] 

shravankumar commented on HDFS-503:
---

Hello sir,

1. what is the meaning for this 
   srcPath prefix=hdfs://dfs1.xxx.com:8000/user/dhruba/

2. In ADMINISTRATION they mentioned RaidNode Software what it means.

3. In HADOOP_HOME, run ant package to build Hadoop and its contrib packages. 
   This will come when we installed hadoop 0.20.1 or we need download ant 
package.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-06-02 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874518#action_12874518
 ] 

shravankumar commented on HDFS-503:
---

The tags(property,description) used in programming are normal HTML Tags or they 
have different meaning.
Can you send me the document which consist of meanings of these tags.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-06-02 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874902#action_12874902
 ] 

Rodrigo Schmidt commented on HDFS-503:
--

The tags are XML.

There is no documentation for the tags, either.

In short, Raid is still being optimized and changes are constant. Any strong 
documentation effort at this point would be meaningful for a very short period 
of time.

The source code is the best and most precise documentation you can rely upon. 
That's the good thing about open source projects. You can easily get around 
stale documentation.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-06-01 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873913#action_12873913
 ] 

shravankumar commented on HDFS-503:
---

Thank you.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-05-31 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873571#action_12873571
 ] 

shravankumar commented on HDFS-503:
---

Thank you sir.
I have one more query both raid1.txt and raid2.txt looks similar what is the 
difference between them.
In the implementation for parity whether they are used NORMAL CRC OR SOME OTHER 
MECHANISMS like REED SOLOMON CODES.


Shravan Kumar.




 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-05-31 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873572#action_12873572
 ] 

shravankumar commented on HDFS-503:
---

For raid1.txt and raid2.txt any DESIGN DIAGRAMS like Class Diagram are there.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-05-31 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873589#action_12873589
 ] 

Rodrigo Schmidt commented on HDFS-503:
--

raid1.txt and raid2.txt are different patches. The most recent was the one that 
got committed.

Raid is implementing simple xor parity right now, but we have plans to extend 
it in the future.

Sorry, no design diagrams that I'm aware of.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-04-28 Thread shravankumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861740#action_12861740
 ] 

shravankumar commented on HDFS-503:
---

Dear sir,
I have downloaded the code for Implement erasure coding as a layer on HDFS. 
But i was unable to execute it. Please guide me regarding this.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.22.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2010-04-28 Thread Rodrigo Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861924#action_12861924
 ] 

Rodrigo Schmidt commented on HDFS-503:
--



Hi, 

Can you provide more details on what you have done and what didn't work.

Did you follow the instructions on the README file? Which error did you see?

Cheers,
Rodrigo








 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.22.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-11-09 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775050#action_12775050
 ] 

dhruba borthakur commented on HDFS-503:
---

I am investigating

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.22.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-10-06 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762538#action_12762538
 ] 

dhruba borthakur commented on HDFS-503:
---

{quote}

 +1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 13 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of 
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

{quote}


 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754683#action_12754683
 ] 

Hadoop QA commented on HDFS-503:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419417/raid2.txt
  against trunk revision 814221.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 13 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 157 release audit warnings 
(more than the trunk's current 154 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/24/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/24/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/24/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/24/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/24/console

This message is automatically generated.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-09-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754787#action_12754787
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-503:
-

 I created a separate JIRA HDFS-600 to make the Parity generation algorithm 
 pluggable.
Thanks, Dhruba.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-09-09 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753349#action_12753349
 ] 

Raghu Angadi commented on HDFS-503:
---

This seems pretty useful.  since this is done outside HDFS, it is simpler for 
users to start experimenting.

Say a file has 5 blocks with replication of 3 : total 15 blocks
With this tool, replication could be reduced to 2, with one block for parity : 
total 10 + 2 blocks
This is a savings of 20% space. Is this math correct?

Detecting when to  'unRaid' : 
  * The patch does this using a wrapper filesystem over HDFS.
  ** This requires file to be read by the client. 
  ** More often than not, HDFS knows about irrecoverable blocks much before 
a client  reads.
  ** this only semi-transparent to the users since they have to use the new 
filesystem.

  * Another completely transparent alternative could be to make 'RaidNode' ping 
NameNode for missing blocks.
  ** NameNode already knows about blocks that don't have any known good 
replica. And fetching that list is cheap.
  ** RaidNode could check if the corrupt/missing block belongs to any of 
its files. 
  ** Rest of RaidNode pretty much remains the same as this patch.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: raid1.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-08-28 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749047#action_12749047
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-503:
-

Took a quick look of the patch.  Very cool!

It seems that the parity is computed by xor.  If there is a clean api, we may 
improve it by some advanced codes like Reed-Solomon in the future.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: raid1.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-08-28 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749081#action_12749081
 ] 

dhruba borthakur commented on HDFS-503:
---

Hi Nicholas, I agree with you completely. The current patch implements basic 
xor. Once this patch is accepted by the community, I plan to make the algorithm 
pluggable, so that people can plug in more advanced erasure codes into the  
framework laid out by this patch.

If you have the time and energy, please review the patch and provide any 
feedback you may have. Thanks.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: raid1.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-07-24 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735011#action_12735011
 ] 

Hong Tang commented on HDFS-503:


As a reference, FAST 09 has a paper that benchmarks the performance of various 
open source erasure coding implementations: 
http://www.cs.utk.edu/~plank/plank/papers/FAST-2009.html.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.