On 04/09/2018 09:37 PM, Andres Freund wrote: > > > On April 9, 2018 12:26:21 PM PDT, Anthony Iliopoulos <ail...@altatus.com> > wrote: > >> I honestly do not expect that keeping around the failed pages will >> be an acceptable change for most kernels, and as such the >> recommendation >> will probably be to coordinate in userspace for the fsync(). > > Why is that required? You could very well just keep per inode > information about fatal failures that occurred around. Report errors > until that bit is explicitly cleared. Yes, that keeps some memory > around until unmount if nobody clears it. But it's orders of > magnitude less, and results in usable semantics. >
Isn't the expectation that when a fsync call fails, the next one will retry writing the pages in the hope that it succeeds? Of course, it's also possible to do what you suggested, and simply mark the inode as failed. In which case the next fsync can't possibly retry the writes (e.g. after freeing some space on thin-provisioned system), but we'd get reliable failure mode. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services