On Wed, Oct 24, 2012 at 7:17 PM, Simon Slavin <slav...@bigfraud.org> wrote:
> A) fsync() doesn't work the way it's meant to on the majority of user 
> platforms.  It effectively does nothing.  Here are typical notes for Windows 
> Server and FreeBSD:

Many systems lie, that's true.  For example: Virtual Box by default
lies about cache flushes.  And consumer hardware typically does as
well.  The systems I'm familiar with implement fsync() correctly as
long as the hardware doesn't lie.  (Nothing much can be done about
lying hardware, especially if the lies go beyond merely not flushing
caches.  Though if cache flushing is the only thing the hardware lies
about then the OS/filesystem can implement a technique for recovery
like the one I described.  Indeed, ZFS does just that.)

But the point is that Richard asked for a light-weight barrier API and
it exists as I described.  Any API explicitly designed for this
purpose could still be implemented incorrectly, or just lie through
its teeth.  SQLite can't help this.  SQLite *can* use available APIs:
when the OS/FS/HW don't lie using these APIs is way better than not
using them, and if the OS/FS/HW lie, well, that's not SQLite's
problem.  At best SQLite could mitigate the lies by... doing what I
suggested: keep around N non-garbage-collected most recent
transactions so the the most recent transaction that can be validated
-meaning its writes hit disk- is taken as the current state of the DB.

Nico

PS: Typically OSes implement fsync(), and all filesystem system calls
via a VFS switch, so the actual implementation of fsync() generally
depends on the actual filesystem in addition to the OS and the
hardware.  A filesystem like a traditional UFS might correctly flush
caches and so on and yet fail to implement fsync() as a Durability
guarantee on account of not having a COW structure on disk, such that
a power failure in the middle of subsequent writes can leave the
filesystem inconsistent.  A filesystem like ZFS doesn't have this
problem.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to