Re: [sqlite] presentation about ordering and atomicity of filesystems

Nico Williams Fri, 12 Sep 2014 11:41:53 -0700

On Fri, Sep 12, 2014 at 1:18 PM, Howard Chu <h...@symas.com> wrote:
> Nico Williams wrote:
>> On ZFS datasets with sync disabled fsync() functions as osync(), as a
>> write
>> barrier without durability and without the associated penalty.  The
>> obvious
>> problem is that really do need osync() and fsync(); just one or the other
>> is not a reasonable compromise.
>
> Write barriers have been debated in Linux ad nauseum. I agree that osync()
> would be great to have, but it's still a die roll - the OS can flush blocks
> to the storage device in order, but without waiting for the storage device's
> buffer to empty, can't make any further ordering promises from there. You
> need device-level ordering support too. - which prompted my suggestion here


For ZFS there's no problem if this happens: you might lose whole
transactions, but the filesystem will remain consistent.  That's the
no-durability-guarantee part of a write barrier.

On recovery ZFS wants you to note and approve of any transactions lost
beyond the last one that was in process of being flushed.  If the last
one didn't complete that may be because of a power outage before the
cache flush could complete.  If more than one transaction didn't
completely reach stable storage ZFS figures it must have been that the
HW lied about cache flushes.

Of course, one might actually want the filesystem to never issue cache
flushes except in response to sync()/fsync()/fdatasync() calls.  IIRC
there's no way to configure such behavior in ZFS -- one can only make
sync/fsync/fdatasync not wait for a cache flush.  For some uses one
might really want such behavior though: a few lost transactions not
closed because of a sync/fsync/fdatasync syscall may be tolerable.
But then, ZFS can keep transactions open for a fairly long time if
sync I/O is not requested by any apps...

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] presentation about ordering and atomicity of filesystems

Reply via email to