On Thu, Sep 11, 2014 at 5:49 PM, Kees Nuyt <k.n...@zonnet.nl> wrote: > > Hi all, > > Today I bumped into a presentation about ordering and atomicity > of filesystems that might interest you. > > https://www.youtube.com/watch?v=YvchhB1-Aws > > The Application/Storage Interface: After All These Years, We're > Still Doing It Wrong > Remzi Arpaci-Dusseau, University of Wisconsin—Madison > > Talk at usenix 2014 Published on Sep 4, 2014 by USENIX > Association Videos > > Somewhat related to the article drh recently wrote about using > sqlite as an application data store. > > Thanks for the link, Kees!
I just finished watching the video. Remzi Arpaci-Dusseau talks about research (done by he and his graduate students) into how well application data survives system crashes. Remzi observes that filesystem developers have worked very hard for many years ensuring that filesystem metadata is preserved in a crash, but they seem less concerned about protecting application data. Remzi developed tools (BOB and ALICE) to study various workloads to see how vulnerable they were to system crashes. He looked at various "applications". His definition of "application" includes standalone programs like Git and Hg, and database servers like PostgreSQL, and libraries like SQLite and LevelDB. At one point he shows a chart that counts the number of unwarranted assumptions that the applications make about filesystem behavior. Such unwarranted assumptions can lead to corruption following a system crash (or power loss). SQLite and PostgreSQL came out on top, with just one vulnerability each. Hg and Git each had many vulnerabilities. In fairness, Remzi points out that these vulnerabilities assume a "worst case" filesystem and that many of them might not exist on a modern filesystem like EXT4. Remzi: I would very much like to learn more about that one unwarranted durability assumption that you contend SQLite is making. That SQLite does well in an analysis using ALICE and BOB is not really surprising. It turns out that we SQLite developers have our own ALICE and BOB like tools that we have implemented using custom VFSes. We have three of them, actually, implemented at different times, by both me and Dan. (Only two are BOB- and ALICE-like crash simulators - the third tool is an invariant checker that helps us to prove crashes are recoverable.) We run many cycles of all three prior to every release, looking for crash vulnerabilities. If SQLite really is making an unwarranted durability assumption, as Remzi contends, then that points to a deficiency in our three crash analyzers, which is something we would like to fix. Remzi also talks about the idea of a new system call that he refers to as "osync()" that causes I/O operations to be ordered. I've been saying much the same thing, for years, to anybody who would listen, though I've been calling the system call a "write barrier". The idea is that if you could replace fsync() with the write barrier, you would lose durability (which few people really care about) but gain a lot more performance. Remzi shows a test case using SQLite where osync() instead of fsync() results in a ten-fold performance improvement. -- D. Richard Hipp d...@sqlite.org _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users