On Fri, 12 Sep 2014 19:38:53 +0100 Simon Slavin <slav...@bigfraud.org> wrote:
> I don't think it can be done by trying to build it on top of an > existing file system. I think we need a file system (volume format, > drivers, etc.) built from the ground up with > atomicity/ACID/transactions in mind. Since the greatest of these is > transactions, I want a transactional file system. Funny you should mention that. About 6 years ago Mike McKusick gave a presentation on then-recent updates to FFS in FreeBSD, including the birthdate. Among other things, I remember he explored using a tree instead of an array for a directory file, but found that because the vast majority of directories hold a small number of names, the overall performance is better with a simple array. I asked your question: why not add transactions to FFS? His answer: that's the province of a database. The filesystem guys I've talked to all consider transactions to be a high-level construct hardly ever needed by most applications. They're interested in raw speed and a consistent file structure (metadata) after a crash. They feel that just making one disk look like another is hard enough. On the evidence, they're right. SQLite, for example, makes precious little use of the filesystem, insofar as the whole database is one file. Within the database, there are no directories, no filesystem metadata, no inodes, nothing for the OS virtualize/arbitrate between users. Apart from anciliary files, about the only thing the FS supplies is extent management for blocks, and user convenience. I don't have to explain to you why even such extent management as it provides is suboptimal from the point of view of the DBMS. Maybe we should go back to the future. You remember when DBMSs didn't use the filesystem, but acted on (the kernel's abstraction of) the device directly. Implement a block-transaction store on the device itself: no inodes, no directories, just writeable blocks managed in transactions. Build your DBMS on that. Use the DBMS to build a user-space filesystem. With a little cleverness, they could be mutually intelligible, such that tables looked like files and find(1) would locate data in the database. --jkl _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users