On 13 Sep 2014, at 12:47am, James K. Lowden <jklow...@schemamania.org> wrote:
> The filesystem guys I've talked to all consider transactions to be a > high-level construct hardly ever needed by most applications. They're > interested in raw speed and a consistent file structure (metadata) > after a crash. They feel that just making one disk look like another > is hard enough. > > On the evidence, they're right. Programmers don't expect file services to support transactions because file services have never supported transactions. We're not used to specifying, when we write to one or many files, that a bunch of changes all go together. But when it becomes possible everyone will rave about it. I come from a banking background and I can imagine the transports of delight the financial techies will go through when they can assure their users that a crash in software will never lead to an inconsistent state: if one account was debited, the other will definitely be credited, and the transaction will definitely be in the ledger. No need to run consistency checks every time even the simplest application crashes. > SQLite, for example, makes precious > little use of the filesystem, insofar as the whole database is one > file. Except ... one thing that annoys me about SQLite is that it needs to make a journal file which isn't part of the database file. Why ? Why can't it just write the journal to the database file it already has open ? This would reduce the problems where the OS prevents an application from creating a new file because of permissions or sandboxing. Similarly, temporary indexes and temporary tables (I think) also go in external files. I don't see why, if they're part of 'main', they can't go in the main file. > Maybe we should go back to the future. You remember when DBMSs didn't > use the filesystem, but acted on (the kernel's abstraction of) the > device directly. Implement a block-transaction store on the device > itself: no inodes, no directories, just writeable blocks managed in > transactions. Build your DBMS on that. Use the DBMS to build a > user-space filesystem. With a little cleverness, they could be mutually > intelligible, such that tables looked like files and find(1) would > locate data in the database. That would be ... erm ... perhaps a new disk volume format. Where the blocks of the volume were pages of a SQLite database. One in which VACUUM could never shrink the file (but perhaps could be subverted to do something like defragmentation). It would remove one level of abstraction between the OS and the bits on the disk surface. And that's always good. It would still rely on the disk driver correctly supporting flush and all other flush-like operations the OS implemented. Most of them will when the jumpers on the drive are set correctly and the driver mounts the volume correctly. Simon. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users