On Sunday, 2009-12-27, at 19:19 , Shawn Willden wrote: > Indeed, the files are just as much deleted as they are in any Unix > file system. The only difference is that in a Tahoe grid garbage > collection is much slower (really slow if the storage nodes have GC > turned off).
It's true that this same issue is present in any unix file system, but the speed of garbage collection is not the only difference. An important difference is that every unix filesystem disallows hard links to directories. (An exception that proves this rule is that Apple recently extended HFS to allow hardlinks to directories, but only with some specific limitations intended to prevent cycles, and only to support Time Machine backups.) Also non-unix filesystems such as Windows and pre-unix Mac disallow hardlinks to directories, and even hardlinks to files. This makes me suspicious that the designers of those systems had good reasons for this, and the fact that Tahoe-LAFS gaily allows hardlinks to any object is probably an example of fools rushing in where angels fear to tread. That is: users are inherently confused by a "path-based filesystem" abstraction or a "folders-and-documents" abstraction built on top of an arbitrary directed graph. The most successful filesystem products try to hide the arbitrary graph layer as much a possible, where Tahoe- LAFS tries to expose it as much as possible. Further cause for concern: many Unix users, even "power users", try to avoid the use of hardlinks whenever possible, considering them a confusing and error-prone feature. Pretty gloomy picture. But there is hope: The Web! Suppose instead of thinking of their Tahoe-LAFS-hosted files and their Tahoe-LAFS directories as being part of a "folders-and- documents" abstraction, and instead of them being part of a unixy path-based "filesystem", they thought of them as a collection of web pages which could have hyperlinks to one another. Then there is no more "impedance mismatch" between the abstraction in the user's head and the underlying graph structure. No user is ever surprised that multiple web pages can point to the same web page, or that following a series of hyperlinks can take you in a circle. No software intended for the Web assumes that the set of web pages that it will visit forms a perfectly hierarchical tree structure without cycles or converging links. Basically, the Web has proven to be both a more powerful and a more user-friendly abstraction for managing collections of documents than the old path-based filesystem abstraction or the old folders-and- documents abstraction. Regards, Zooko P.S. My brother Nejucomo says that it should be named "tahoe unlink" instead of "tahoe rm". I think that that would be a good usability improvement. P.P.S. My wife Amber says that the only reason people limited filesystems to a tree structure is that many important algorithms that you might want to use on your filesystem would be inefficient on non-tree structures, but now that the Web has formed itself as a non- tree structure we have been forced to develop heuristics and work around such inefficiencies anyway. P.P.P.S. See Mark Bernstein's blog entry on how Engelbart's vision for what is now The Web had a hierarchical principle and Nelson's had that "everything is deeply intertwingled": http:// www.markbernstein.org/Feb0301/Engelbart.html _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
