On 10/25/11 9:20 AM, Dirk Loss wrote: > To foster my understanding, I've tried to visualize what that means: > > http://dirk-loss.de/tahoe-lafs_nhk-defaults.png
Wow, that's an awesome picture!. If we ever get to produce an animated cartoon of precious shares and valiant servers and menacing attackers battling it out while the user nervously hits the Repair button, I'm hiring you for sure :). But yeah, your understanding is correct. It may help to know that "H" is a relatively recent addition (1.7.0, Jun-2010). The original design had only k=3 and N=10, but assumed that you'd only upload files in an environment with at least N servers (in fact the older design, maybe 1.2.0 in 2009, had the Introducer tell all clients what k,N to use, instead of them picking it for themselves). Our expectation was thus that you'd get no more than one share per server, so "losing 7 servers" was equivalent to "losing 7 shares", leaving you 3 (>=k) left. I designed the original uploader to allow uploads in the presence of fewer than N servers, by storing multiple shares per server as necessary to place all N shares. The code strives for uniform placement (it won't put 7 on server A and then 1 each on servers B,C,D, unless they're nearly full). My motivation was to improve the out-of-the-box experience (where you spin up a test grid with just one or two servers, but don't think to modify your k/N to match), and to allow reasonable upgrades to more servers later (by migrating the doubled-up shares to new servers, keeping the filecaps and encoding the same, but improving the diversity). There was a "shares of happiness" setting in that original uploader, but it was limited to throwing an exception if too many servers drop off *during* the upload itself (which commits to a fixed set of servers at the start of the process). I still expected there to be plenty of servers available, so re-trying the upload would still get you full diversity. The consequences of my choosing write-availability over reliability show up when some of your servers are already down when you *start* the upload (this wasn't a big deal for the AllMyData production grid, but happens much more frequently in a volunteergrid). You might think you're on a grid with 20 servers, but it's 2AM and most of those boxes are turned off, so your upload only actually gets to use 2 servers. The old code would cheerfully put 5 shares on each, and now you've got a 2POF (dual-point-of-failure). The worst case was when your combination client+server hadn't really managed to connect to the network yet, and stored all the shares on itself (SPOF). You might prefer to get a failure rather than a less-reliable upload: to choose reliability over the availability of writes. So 1.7.0 changed the old "shares of happiness" into a more accurate (but more confusing) "servers of happiness" (but unfortunately kept the old name). It also overloaded what "k" means. So now you set "H" to be the size of a "target set". The uploader makes sure that any "k"-sized subset of this target will have enough shares to recover the file. That means that H and k are counting *servers* now. (N and k still control encoding as usual, so k also counts shares, but share *placement* is constrained by H and k). The uploader refuses to succeed unless it can get sufficient diversity, where H and k define what "sufficient" means. (there may be situations where an upload would fail, but your data would still have been recoverable: shares-of-happiness is meant to ensure a given level of diversity/safety: choosing reliability over write-availability) So the new "X servers may fail and your data is still recoverable" number comes from H-k (both counting servers). The share-placement algorithm still tries for uniformity, and if it achieves that then you can tolerate even more failures (up to N-k if you manage to get one share per server). I'm still not sure the servers-of-happiness metric is ideal. While it lets you specify a safety level more accurately/meaningfully in the face of insufficient servers, it's always been more hard to explain and understand. Some days I'm in favor of a more-absolute list of "servers that must be up" (I mocked up a control panel in http://tahoe-lafs.org/pipermail/tahoe-dev/2011-January/005944.html). Having a minimum-reliability constraint is good, but you also want to tell your client just how many servers you *expect* to have around, so it can tell you whether it can satisfy your demands. I still think it'd be pretty cool if the client's Welcome page had a little game or visualization where it could show you, given the current set of available servers, whether the configured k/H/N could be satisfied or not. Something to help explain the share-placement rules and help users set reasonable expectations. cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
