On Tue, Apr 10, 2018 at 1:37 AM, Craig Ringer <cr...@2ndquadrant.com> wrote: > ... but *only if they hit an I/O error* or they're on a FS that > doesn't reserve space and hit ENOSPC. > > It still does 99% of the job. It still flushes all buffers to > persistent storage and maintains write ordering. It may not detect and > report failures to the user how we'd expect it to, yes, and that's not > great. But it's hardly throw up our hands and give up territory > either. Also, at least for initdb, we can make initdb fsync() its own > files before close(). Annoying but hardly the end of the world.
I think we'd need every child postgres process started by initdb to do that individually, which I suspect would slow down initdb quite a lot. Now admittedly for anybody other than a PostgreSQL developer that's only a minor issue, and our regression tests mostly run with fsync=off anyway. But I have a strong suspicion that our assumptions about how fsync() reports errors are baked into an awful lot of parts of the system, and by the time we get unbaking them I think it's going to be really surprising if we haven't done real harm to overall system performance. BTW, I took a look at the MariaDB source code to see whether they've got this problem too and it sure looks like they do. os_file_fsync_posix() retries the fsync in a loop with an 0.2 second sleep after each retry. It warns after 100 failures and fails an assertion after 1000 failures. It is hard to understand why they would have written the code this way unless they expect errors reported by fsync() to continue being reported until the underlying condition is corrected. But, it looks like they wouldn't have the problem that we do with trying to reopen files to fsync() them later -- I spot checked a few places where this code is invoked and in all of those it looks like the file is already expected to be open. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company