Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Thomas Munro Wed, 04 Apr 2018 00:34:03 -0700

On Wed, Apr 4, 2018 at 6:00 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 4 April 2018 at 13:29, Thomas Munro <thomas.mu...@enterprisedb.com>
> wrote:
>> /* Ensure that we skip any errors that predate opening of the file */
>> f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
>>
>> [...]
>
> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel
> will deliberately hide writeback errors that predate our fsync() call from
> us?


Predates the opening of the file by the process that calls fsync().
Yeah, it sure looks that way based on the above code fragment.  Does
anyone know better?

> Does that mean that the ONLY ways to do reliable I/O are:
>
> - single-process, single-file-descriptor write() then fsync(); on failure,
> retry all work since last successful fsync()

I suppose you could some up with some crazy complicated IPC scheme to
make sure that the checkpointer always has an fd older than any writes
to be flushed, with some fallback strategy for when it can't take any
more fds.

I haven't got any good ideas right now.

> - direct I/O

As a bit of an aside, I gather that when you resize files (think
truncating/extending relation files) you still need to call fsync()
even if you read/write all data with O_DIRECT, to make it flush the
filesystem meta-data.  I have no idea if that could also be affected
by eaten writeback errors.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Reply via email to