Re: [sqlite] presentation about ordering and atomicity of filesystems

Richard Hipp Thu, 11 Sep 2014 21:08:57 -0700

On Thu, Sep 11, 2014 at 5:49 PM, Kees Nuyt <k.n...@zonnet.nl> wrote:

>
> Hi all,
>
> Today I bumped into a presentation about ordering and atomicity
> of filesystems that might interest you.
>
> https://www.youtube.com/watch?v=YvchhB1-Aws
>
> The Application/Storage Interface: After All These Years, We're
> Still Doing It Wrong
> Remzi Arpaci-Dusseau, University of Wisconsin—Madison
>
> Talk at usenix 2014 Published on Sep 4, 2014 by USENIX
> Association Videos
>
> Somewhat related to the article drh recently wrote about using
> sqlite as an application data store.
>
>
Thanks for the link, Kees!


I just finished watching the video.  Remzi Arpaci-Dusseau talks about
research (done by he and his graduate students) into how well application
data survives system crashes.  Remzi observes that filesystem developers
have worked very hard for many years ensuring that filesystem metadata is
preserved in a crash, but they seem less concerned about protecting
application data.

Remzi developed tools (BOB and ALICE) to study various workloads to see how
vulnerable they were to system crashes.  He looked at various
"applications".  His definition of "application" includes standalone
programs like Git and Hg, and database servers like PostgreSQL, and
libraries like SQLite and LevelDB.  At one point he shows a chart that
counts the number of unwarranted assumptions that the applications make
about filesystem behavior.  Such unwarranted assumptions can lead to
corruption following a system crash (or power loss).

SQLite and PostgreSQL came out on top, with just one vulnerability each.
Hg and Git each had many vulnerabilities.  In fairness, Remzi points out
that these vulnerabilities assume a "worst case" filesystem and that many
of them might not exist on a modern filesystem like EXT4.

Remzi:  I would very much like to learn more about that one unwarranted
durability assumption that you contend SQLite is making.

That SQLite does well in an analysis using ALICE and BOB is not really
surprising.  It turns out that we SQLite developers have our own ALICE and
BOB like tools that we have implemented using custom VFSes.  We have three
of them, actually, implemented at different times, by both me and Dan.
(Only two are BOB- and ALICE-like crash simulators - the third tool is an
invariant checker that helps us to prove crashes are recoverable.)  We run
many cycles of all three prior to every release, looking for crash
vulnerabilities.  If SQLite really is making an unwarranted durability
assumption, as Remzi contends, then that points to a deficiency in our
three crash analyzers, which is something we would like to fix.

Remzi also talks about the idea of a new system call that he refers to as
"osync()" that causes I/O operations to be ordered.  I've been saying much
the same thing, for years, to anybody who would listen, though I've been
calling the system call a "write barrier".  The idea is that if you could
replace fsync() with the write barrier, you would lose durability (which
few people really care about) but gain a lot more performance.  Remzi shows
a test case using SQLite where osync() instead of fsync() results in a
ten-fold performance improvement.

-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] presentation about ordering and atomicity of filesystems

Reply via email to