Shawn Willden wrote: > > I'd like to confuse your question by adding more issues to consider > ;-)
Excellent! :) > The probability that a single file will survive isn't just a function > of the encoding parameters and the failure modes of the servers > holding its shares, it's also dependent on the availability of its > cap. Yes, that's an excellent point. The probability of recovering a file that's 6 subdirectories deep is equal to the product of the ancestor nodes' recovery probabilities, so there's a good argument for making the parents stronger than the child nodes. I think I even filed a ticket years ago about how maybe dirnodes should get a different (more conservative) set of encoding parameters than regular immutable files, I think it was titled "just set k=1 for mutable files?". I sometimes try to draw an analogy with signal-processing terms, when you're looking at how much noise is added by various parts of the system. If the new component that you're adding causes an order of magnitude less noise than any other component, you can effectively ignore its contribution. Likewise, if dirnodes were 10x more reliable than filenodes, you could chain several of them without significantly reducing the probability of recovering the file. In most environments, simply reducing "k" by a bit can hit this 10x improvement, and since dirnodes are (usually) much smaller than filenodes, doesn't cost much extra space. Other than simply encoding dirnodes differently, I can imagine two straightforward things to improve the situation. The first is that the "repairdb" that I described earlier could have some columns to indicate whether the object is a directory or a file, and how far away from the root it was encountered (the latter could be fooled by non-tree-shaped structures, but would probably be good enough). This information would then factor into the repair prioritization process: senior dirnodes would receive top priority when deciding how to allocate repair bandwidth (frequency of checking, priority of repair relative to other victims, and repair-threshold / willingness to repair even minor injuries). The rootcap would be highest priority. The other would be to use the repairdb as a secondary backup. A two-column "child of" table, holding (parent, child) pairs, would be enough to reassemble the shape and nodes of the graph completely (recording childnames would also preserve the edge names, at the expense of space). Of course, we could always serialize this table and store *that* in the grid as a single file, which would reduce the maximum-path-length (product-of-probabilities) to two, assuming you had a good way to remember the filecap of this snapshot. If we made this reference only the (immutable) files and their pathnames, we'd have a #204 "virtual CD". If it just recorded all the dircaps/filecaps and their relationships, you'd have a snapshot of the tree at that point in time, which you could navigate later and extract files (extracting dirnodes would be dubious, because they may have been mutated in the meantime). On one hand, if we feel we need this "extra reliability" on top of Tahoe, maybe it suggests that we need to improve Tahoe to provide that level of reliability natively. On the other, maybe it's just that data and metadata are qualitatively different things, and deserve different treatment: if it's relatively cheap to retain a local copy of the dirnode graph, since it's usually smaller than the data itself, then why not? I guess there's a tradeoff between reliability and how much data you're willing to retain. Recording just the rootcap means you depend upon longer chains to retain access to everything else, but you've got a smaller starting point to keep secure (and the rootcap is constant, so it's easy to keep it safe: no distributed update protocol necessary). Recording a whole snapshot shrinks those chains down to zero (but obligates you to retain that table and keep it up-to-date). Writing a snapshot into a mutable file and remembering the filecap makes it length one. There's also a tradeoff between the effort required to walk the tree and how much data you're willing to retain plus keep up-to-date. One thing I like about tahoe's filecap/dircap design is that at least this tradeoff is pretty easy to understand, and that it's easy to imagine other ways to manage the filecaps. Having dircaps in the system seemed (to me, at the time) as a necessary component, because Tahoe should provide a reasonable native mechanism for such a common task, and because it nicely meets our goal of simple fine-grained sharing (whereas using one big table would not make it easy to share the intermediate nodes without sharing the whole thing). But, as GridBackup shows, they're hardly the only approach. Indeed, since "tags" seem to be all the rage these days, there's a good argument for some users to ditch the dirnodes and just manage a big mutable database that maps search terms to filecaps. But yeah, the repair process should definitely be cognizant of the relative importance of the things being repaired. That's part of my inclination towards the table-based scheme (as opposed to the simple already-implemented deep-check-and-repair operation), so there'll be a place where this kind of policy can be applied. thanks, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
