Hi, all. I have a couple questions regarding Reed-Solomon codes and their use in Tahoe. I see in "Tahoe - The Least-Authority Filesystem" that erasure coding is based on 2 parameters: N, the total number of shares that will be written, and K, the number of shares to be used on read. Does this imply that the data being written can sustain N-K failures?
I know that the rateless erasure code folks use the term "systematic" to mean an erasure code where all the input symbols appear in the output symbols. I don't know if that term is widely used, but Reed-Solomon is a systematic erasure code, isn't it? I mean, in the absence of failure, I'll always choose to read from my data strips instead of my parity or check strips, right? In the presence of a failure, when I need to rebuild one or more strips, is it true that I always need to read K strips to rebuild those strips? I mean, please help me consider the following scenario: 1. I have 10 storage nodes in my system; 10TB of data (prior to erasure coding) is spread evenly across all 10 nodes and I can suffer 1 failure 2. Some lunatic attacks one of my storage nodes with an axe. :) I commission a new, empty node to take its place. 3. Is it true that, in aggregate, I need to move 10TB of data around my distributed system to produce the 1TB of data for the new node? Thanks very much for any help, Russ _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
