Interesting ideas, Shawn. I'll just respond to a couple of details, below, plus I'll say that I think Kevan should proceed with his plan to implement "servers of happiness" now, as it is a small enough change that he can finish it before the fall semester starts. :-)
In the long run, I can see how some people (possibly including me) would like the sort of sophisticated, heuristical, statistical approach that you envision, but I can also see how some people (possibly including me) would want a dumber, more predictable set of rules, such as "Make sure there are always at least K+1 shares in each of these Q co-los, and that's the entire ruleset.". On Wednesday,2009-08-12, at 9:01 , Shawn Willden wrote: > 1. Though I've previously indicated that it's a bad idea to keep a > share locally when uploading for backup, I've reconsidered this > notion. A local share is useless for ONE purpose of backups -- > disaster recovery -- but it improves performance of retrievals for > many other purposes of backups. Ah, good point! > 2. Retrieval performance is maximized when shares are retrived > from as many servers at once (assuming all are roughly equally > responsive). This means that K should be set to the size of the > grid, and M adjusted based on reliability requirements. This is > the reverse of what I have been thinking. I went through a similar reversal: http://allmydata.org/pipermail/tahoe-dev/2009-April/001554.html # using the volunteer grid (dogfood tasting report, continued) > 3. M larger than the grid means that each server will receive > multiple shares. A reasonable first assumption is that all shares > on a given server will survive or fail together, so the effective > reliability of a file is a function not of how many shares must be > lost for it to disappear, but how many servers. Yes, thus the motivation for #778. By the way, I wonder if #678 would interest you. If we had #678, then your strategy could delay making up its mind about the best value of M until later, during a repair process, possibly after the set of servers and their known qualities has changed. It should be relatively easy to make up your mind about a good value for K -- for example, maybe just K = number-of-servers for starters? #678 is not a ticket that can be fixed before the fall semester starts, though. Hopefully it will be fixed by the next generation of capabilities (http://allmydata.org/trac/tahoe/wiki/NewCapDesign ). Regards, Zooko tickets mentioned in this mail: http://allmydata.org/trac/tahoe/ticket/778 # "shares of happiness" is the wrong measure; "servers of happiness" is better http://allmydata.org/trac/tahoe/ticket/678 # converge same file, same K, different M _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
