Re: [sqlite] presentation about ordering and atomicity of filesystems

James K. Lowden Fri, 12 Sep 2014 16:48:53 -0700

On Fri, 12 Sep 2014 19:38:53 +0100
Simon Slavin <slav...@bigfraud.org> wrote:


> I don't think it can be done by trying to build it on top of an
> existing file system.  I think we need a file system (volume format,
> drivers, etc.) built from the ground up with
> atomicity/ACID/transactions in mind.  Since the greatest of these is
> transactions, I want a transactional file system.

Funny you should mention that.  About 6 years ago Mike McKusick gave a
presentation on then-recent updates to FFS in FreeBSD, including the
birthdate.  Among other things, I remember he explored using a tree
instead of an array for a directory file, but found that because the
vast majority of directories hold a small number of names, the overall
performance is better with a simple array.  

I asked your question: why not add transactions to FFS?  

His answer: that's the province of a database.  

The filesystem guys I've talked to all consider transactions to be a
high-level construct hardly ever needed by most applications.  They're
interested in raw speed and a consistent file structure (metadata)
after a crash.  They feel that just making one disk look like another
is hard enough.  

On the evidence, they're right.  SQLite, for example, makes precious
little use of the filesystem, insofar as the whole database is one
file.  Within the database, there are no directories, no filesystem
metadata, no inodes, nothing for the OS virtualize/arbitrate between
users.  Apart from anciliary files, about the only thing the FS
supplies is extent management for blocks, and user convenience.  I
don't have to explain to you why even such extent management as it
provides is suboptimal from the point of view of the DBMS.  

Maybe we should go back to the future. You remember when DBMSs didn't
use the filesystem, but acted on (the kernel's abstraction of) the
device directly.  Implement a block-transaction store on the device
itself: no inodes, no directories, just writeable blocks managed in
transactions.  Build your DBMS on that.  Use the DBMS to build a
user-space filesystem. With a little cleverness, they could be mutually
intelligible, such that tables looked like files and find(1) would
locate data in the database.  

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] presentation about ordering and atomicity of filesystems

Reply via email to