Re: [sqlite] presentation about ordering and atomicity of filesystems

Simon Slavin Fri, 12 Sep 2014 17:09:07 -0700

On 13 Sep 2014, at 12:47am, James K. Lowden <jklow...@schemamania.org> wrote:


> The filesystem guys I've talked to all consider transactions to be a
> high-level construct hardly ever needed by most applications.  They're
> interested in raw speed and a consistent file structure (metadata)
> after a crash.  They feel that just making one disk look like another
> is hard enough.  
> 
> On the evidence, they're right.

Programmers don't expect file services to support transactions because file 
services have never supported transactions.  We're not used to specifying, when 
we write to one or many files, that a bunch of changes all go together.  But 
when it becomes possible everyone will rave about it.  I come from a banking 
background and I can imagine the transports of delight the financial techies 
will go through when they can assure their users that a crash in software will 
never lead to an inconsistent state: if one account was debited, the other will 
definitely be credited, and the transaction will definitely be in the ledger.  
No need to run consistency checks every time even the simplest application 
crashes.

> SQLite, for example, makes precious
> little use of the filesystem, insofar as the whole database is one
> file.

Except ... one thing that annoys me about SQLite is that it needs to make a 
journal file which isn't part of the database file.  Why ?  Why can't it just 
write the journal to the database file it already has open ?  This would reduce 
the problems where the OS prevents an application from creating a new file 
because of permissions or sandboxing.

Similarly, temporary indexes and temporary tables (I think) also go in external 
files.  I don't see why, if they're part of 'main', they can't go in the main 
file.

> Maybe we should go back to the future. You remember when DBMSs didn't
> use the filesystem, but acted on (the kernel's abstraction of) the
> device directly.  Implement a block-transaction store on the device
> itself: no inodes, no directories, just writeable blocks managed in
> transactions.  Build your DBMS on that.  Use the DBMS to build a
> user-space filesystem. With a little cleverness, they could be mutually
> intelligible, such that tables looked like files and find(1) would
> locate data in the database.  

That would be ... erm ... perhaps a new disk volume format.  Where the blocks 
of the volume were pages of a SQLite database.  One in which VACUUM could never 
shrink the file (but perhaps could be subverted to do something like 
defragmentation).

It would remove one level of abstraction between the OS and the bits on the 
disk surface.  And that's always good.  It would still rely on the disk driver 
correctly supporting flush and all other flush-like operations the OS 
implemented.  Most of them will when the jumpers on the drive are set correctly 
and the driver mounts the volume correctly.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] presentation about ordering and atomicity of filesystems

Reply via email to