Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Tomas Vondra Mon, 09 Apr 2018 12:54:53 -0700


On 04/09/2018 09:37 PM, Andres Freund wrote:
> 
> 
> On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos <[email protected]> 
> wrote:
> 
>> I honestly do not expect that keeping around the failed pages will
>> be an acceptable change for most kernels, and as such the
>> recommendation
>> will probably be to coordinate in userspace for the fsync().
> 
> Why is that required? You could very well just keep per inode 
> information about fatal failures that occurred around. Report errors 
> until that bit is explicitly cleared. Yes, that keeps some memory
> around until unmount if nobody clears it. But it's orders of
> magnitude less, and results in usable semantics.
>


Isn't the expectation that when a fsync call fails, the next one will
retry writing the pages in the hope that it succeeds?

Of course, it's also possible to do what you suggested, and simply mark
the inode as failed. In which case the next fsync can't possibly retry
the writes (e.g. after freeing some space on thin-provisioned system),
but we'd get reliable failure mode.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Reply via email to