On Wed, Feb 2, 2011 at 4:26 PM, Chris Palmer <[email protected]> wrote:
> Shawn Willden writes: > > > The same situation applies, though. For any given expansion factor E, > > assuming moderately high server reliability, you'll get better net > > reliability with N/E-of-N than with 1-of-E. As E goes up, the level of > > server reliability required for break-even declines, and the advantage of > > erasure coding increases. > > I don't understand what you mean. Can you please fill in an example with > real numbers? > Sure. Suppose that you're willing to accept an expansion factor of 5. Suppose also that your servers are 90% reliable over some time interval. So with a 1-of-5 scheme, you don't lose your data unless all five servers fail within the interval, which happens with a probability of 0.1**5 = 1e-5. But a 2-of-10 scheme only loses data if nine servers fail within the interval, which happens with probability 9e-9. Going for a more extreme example, a 10-of-50 scheme requires the failure of 41 servers, which happens with probability 1e-33 -- which is so ludicrously small that the only way your data will be lost is if there's some sort of event that causes widespread failures. Of course, 1-of-10 is better than 2-of-10 and 1-of-50 is better than 10-of-50, but for a given expansion factor erasure coding across a large number of servers is always better, and significantly so. > Isn't my scheme simply the most reliable and most expensive form of erasure > coding? Reliability and cost are independent and largely opposing parameters, so I don't really know how you can evaluate that. Certainly, for a given level of reliability your scheme is the most expensive form of erasure coding, and for a given cost it's the least reliable* :-) -- Shawn * Ignoring failures avoided by simplicity.
_______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
