On 12/01/2015 10:44 PM, Peter Eisentraut wrote:
On 11/27/15 8:18 AM, Michael Paquier wrote:
On Fri, Nov 27, 2015 at 8:17 PM, Tomas Vondra
So, what's going on? The problem is that while the rename() is atomic, it's
not guaranteed to be durable without an explicit fsync on the parent
directory. And by default we only do fdatasync on the recycled segments,
which may not force fsync on the directory (and ext4 does not do that,
Yeah, that seems to be the way the POSIX spec clears things.
"If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall
force all currently queued I/O operations associated with the file
indicated by file descriptor fildes to the synchronized I/O completion
state. All I/O operations shall be completed as defined for
synchronized I/O file integrity completion."
If I understand that right, it is guaranteed that the rename() will be
atomic, meaning that there will be only one file even if there is a
crash, but that we need to fsync() the parent directory as mentioned.
I don't see anywhere in the spec that a rename needs an fsync of the
directory to be durable. I can see why that would be needed in
practice, though. File system developers would probably be able to
give a more definite answer.
Yeah, POSIX is the smallest common denominator. In this case the spec
seems not to require this durability guarantee (rename without fsync on
directory), which allows a POSIX-compliant filesystem.
At least that's my conclusion from reading https://lwn.net/Articles/322823/
However, as I explained in the original post, it's more complicated as
this only seems to be problem with fdatasync. I've been unable to
reproduce the issue with wal_sync_method=fsync.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: