Re: Postgres, fsync, and OSs (specifically linux)

Thomas Munro Sun, 29 Apr 2018 18:12:00 -0700

On Sun, Apr 29, 2018 at 1:58 PM, Craig Ringer <[email protected]> wrote:
> On 28 April 2018 at 23:25, Simon Riggs <[email protected]> wrote:
>> On 27 April 2018 at 15:28, Andres Freund <[email protected]> wrote:
>>>   While I'm a bit concerned adding user-code before a checkpoint, if
>>>   we'd do it as a shell command it seems pretty reasonable. And useful
>>>   even without concern for the fsync issue itself. Checking for IO
>>>   errors could e.g. also include checking for read errors - it'd not be
>>>   unreasonable to not want to complete a checkpoint if there'd been any
>>>   media errors.
>>
>> It seems clear that we need to evaluate our compatibility not just
>> with an OS, as we do now, but with an OS/filesystem.
>>
>> Although people have suggested some approaches, I'm more interested in
>> discovering how we can be certain we got it right.
>
> TBH, we can't be certain, because there are too many failure modes,
> some of which we can't really simulate in practical ways, or automated
> ways.


+1

Testing is good, but unless you have a categorical statement from the
relevant documentation or kernel team or you have the source code, I'm
not sure how you can ever really be sure about this.  I think we have
a fair idea now what several open kernels do, but we still haven't got
a clue about Windows, AIX, HPUX and Solaris and we only have half the
answer for Illumos, and no "negative" test result can prove that they
can't throw away write-back errors or data.

Considering the variety in interpretation and liberties taken, I
wonder if fsync() is underspecified and someone should file an issue
over at http://www.opengroup.org/austin/ about that.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Postgres, fsync, and OSs (specifically linux)

Reply via email to