On Wed, Mar 10, 2010 at 5:55 PM, David-Sarah Hopwood <[email protected]> wrote: > > Suppose that each segment is copied to an independent random choice > of m of the available servers. Then if all m of the servers for *any* > segment die, then part of the file will be lost. Losing part of a file > is essentially equivalent to losing all of it for most applications.
I'm sure that keeping all the blocks of a share together is a win if your measure of success is 1. "minimize the chance that any part of this file will be lost". That's why we do it the way we do. (Jukka Santala and Bram Cohen figured this out in the context of Mojo Nation, which broke up a file into a number of blocks proportional to the size of the file and distributed the blocks independently. This was part of the process of Bram inventing BitTorrent.) However, I'm not sure if that is the best measure of success. And I'm not sure if there are others good ways to do it which would still be good for this measure of success and also good for other measures of success. Other possible measures of success include: 2. minimize the chance that at all parts of this file will be lost This might make more sense for a file where parts of the file still have value even when other parts are lost, such as a long audio recording. It makes less sense for a file where interpreting some parts of the file depends upon on other parts, such as an LZMA-compressed database snapshot. Then there is also the question of the longevity of many files: A. minimize the chance that any file out of a large set of files will be damaged B. minimize the number of files from the large set of files which will be damaged Suppose you have ten thousand files. Would you rather have a 1-in-10 chance that one of your files dies or a 1-in-100,000 chance that all of your files die? I don't know. Tough call. Maybe we shouldn't be trying to optimize too much for these sorts of questions. When Brian is describing the difference between Mountain View and Tahoe-LAFS, he often starts by saying that Tahoe-LAFS stores entire shares (all the blocks of the share) together in one place so that the uploader, downloader, and server have fewer separate objects to keep track of. Maybe that's the best reason to keep doing it the way we're doing it now. Regards, Zooko _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
