On Saturday 20 December 2008 12:50, Michael Pattrick wrote: > >> Just another filesystem I am afraid, with one big advance > >> (versioning) and a number of incremental ones. > And that's the way to go about it, it may seem easy to tack side > projects on but scopecreep (ScopeCreep: the devourer of souls and > government projects) could destroy Tux3. Adding this type of feature > would increase the complexity of a filesystem with the stated goal of > having a 'tight' code base. Having a well defined list of reasonable > features increases the likelihood that a project will be successful, > adding a feature like this right now - just as Tux3 is preparing for > its mainline merge- could delay the merge, increase the time needed to > document, increase code complexity, and possibly introduce new types > of bugs. > > But that's just my take on it.
That's it all right. To be more specific, by sticking with the rule that each allocated extent has exactly one pointer to it, we bypass a whole class of complexity and associated bugs. When versioning is added using the versioned pointers method (versioned extents, versioned attributes) we still keep the single pointer per extent model. At that point, we have snapshotting in a nice flexible form including writeable snapshots of snapshots, without elaborating the Tux3 structural model at all. Only the btree leaf block scanning and editing code changes. We will use our user space unit testing strategy to handle the additional leaf handling complexity, to give us the large number of development and testing iterations that are necessary to make code of that nature work really reliably. A large number of unit testing iterations also helps code settle down to a relatively simple form. Look in version.c and check out the unit testing there to see what I mean: it implements a random fuzz tester to beat heavily on corner cases, trying out millions of combinations in a few seconds and checking for correctness at every step. Of course, this is no substitute for thinking deeply about what is going on, but it is a powerful tool for catching issues that slip through the net of pure reasoning. When we add the additional versioning complexity to the ileaf and dleaf processing code, we will have another layer of unit testing at the leaf level. What this means is that to implement versioning, we combine two well tested components: our classic single-referenced filesystem design and versioning logic that stays strictly within the the dleaf processing. We therefore hope that the vast majority of bugs will be caught by Tux3 developers in unit testing and not by users in full-system testing. Now, single referencing does not immediately support data de-duplication and pointer techniques to avoid file copies. But it does support snapshotting, and should make it easier to do online expand, shrink and checking reliably. These are the must-have features that are currently deficient in Linux, and are real impediments for Linux storage. I respect and admire those developers who are willing to jump in and tackle those other cool features, but to get where we need to be in the Linux storage space, our little group needs to stay focussed on essentials. That said, we will eventually elaborate the Tux3 allocation model to add an allocation btree as a complement to the bitmap table. I have written a little bit about this previously. The executive summary is: for highly fragmented filesystems, bitmap allocation is more efficient than extents (up to 50 times more space efficient) while for large files on unfragmented filesystems, extent allocation is much more efficient. The efficiency equation is compelling enough to justify some extra complexity in order to switch between them, depending on observed allocation statistics. The point of this is, when extent allocation arrives, we can have reference counts on the extents and use that to implement such things as de-duplication. Future fanciness. If somebody wanted to work on de-duplication right now, I would recommend using a per-block reference count table mapped into a file, like the xattr atom refcounting we already have. This is not the most efficient reference counting mechanism in the world, but it will work fine for testing algorithms and proving the worth of the feature. Regards, Daniel _______________________________________________ Tux3 mailing list [email protected] http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
