On 18 Jun 2008, Juliusz Chroboczek outgrape: >>> All modern file-systems use either B-trees or B*-trees for >>> directories. > >> As an aside, ext[234]fs do not, nor will they ever as far as I know. FFS >> does not. A lot of Polipo-using systems probably run atop those :/ > > Indeed, and I cannot understand why filesystems still use linear > search in directories. I realise that system designers don't want to > use bleeding-edge algorithms, but hashtables have been known since the > late 1950s, B-tress since the early 1970s.
I think the constraints of NFS support (-> a 32-bit cookie must be able to hold all state, persistently, across reboots), and the horrible and almost totally unused seekdir()/telldir() syscalls have a nasty effect on tree-based filesystems. :/ plus, say what you like about ext2, it's simple and will work even on nanoscopic systems. (Of course, these days such systems probably don't have any rotating storage at all, making ext2 inappropriate...) >> (ext3fs and ext4fs have a hack whereby filenames are reordered such >> that they can be looked up by means of a hash lookup, which is just >> about as good for most purposes.) > > And modern versions of Berklix will load the whole directory in > memory, where it is stored as a hash table. But I still don't > understand why they're avoiding a proper data structure in the on-disk > format. Simplicity of recovery is part of it. ext[234] fsck is *very* good at recovering data from horrendously mangled filesystems, and we still have the horrible example of reiser3 fsck to tell us how not to do it. Everyone agrees, these days, that the right way to do this is to embed integrity management into the filesystem rather than needing anything that sweeps over the whole fs at any time, but it's hard to rejig existing fses that way. I'm not aware of any *new* filesystem development that's not tree-based: ext4 is an incremental development of ext3 rather than something new. Among other things, doing flash-based storage robustly pretty much requires tree-based filesystems working sort of like a cross between the old trees-on-tape schemes from Knuth (so the representation is linear and makes maximally efficient use of the relatively large eraseblock- sized writes) and the way functional languages do mutable graphs, with rewriting from the leaves up on changes (but I don't need to tell *you* about that!) (btrfs is especially interesting to me. Having unbounded numbers of nearly-free filesystem/directory snapshots seems neat.) >> That's the performance problem in Firefox, for what it's worth: the db >> fsync()s fairly frequency, > > I don't quite agree with your analysis. The performance problem with > Firefox is that it's written by a bunch of people who, when they have > a problem, think ``I know, I'll use a database.'' Now they have two > problems. Well, yeah, that too. I entirely agree with JWZ's opinions on this subject. (I do relational DB stuff for a living. I *know* how much it sucks.) > http://www.jwz.org/doc/mailsum.html But it's better than what they were using before! Mind you, anything else would be impossible. `mork' is the single most hairbrained file format I have ever heard of, bar none. It even beats financial market protocols, which generally start at `demented' and go up in aggravation from there. Read and weep: <http://jwz.livejournal.com/312657.html> ... anyway, this is way off topic for the list now and the hayfever drugs are making me babble, so having horrified you with that I'll shut up. ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Polipo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/polipo-users
