On Mon, Apr 9, 2018 at 8:16 AM, Craig Ringer <cr...@2ndquadrant.com> wrote: > In the mean time, I propose that we fsync() on close() before we age FDs out > of the LRU on backends. Yes, that will hurt throughput and cause stalls, but > we don't seem to have many better options. At least it'll only flush what we > actually wrote to the OS buffers not what we may have in shared_buffers. If > the bgwriter does the same thing, we should be 100% safe from this problem > on 4.13+, and it'd be trivial to make it a GUC much like the fsync or > full_page_writes options that people can turn off if they know the risks / > know their storage is safe / don't care.
Ouch. If a process exits -- say, because the user typed \q into psql -- then you're talking about potentially calling fsync() on a really large number of file descriptor flushing many gigabytes of data to disk. And it may well be that you never actually wrote any data to any of those file descriptors -- those writes could have come from other backends. Or you may have written a little bit of data through those FDs, but there could be lots of other data that you end up flushing incidentally. Perfectly innocuous things like starting up a backend, running a few short queries, and then having that backend exit suddenly turn into something that could have a massive system-wide performance impact. Also, if a backend ever manages to exit without running through this code, or writes any dirty blocks afterward, then this still fails to fix the problem completely. I guess that's probably avoidable -- we can put this late in the shutdown sequence and PANIC if it fails. I have a really tough time believing this is the right way to solve the problem. We suffered for years because of ext3's desire to flush the entire page cache whenever any single file was fsync()'d, which was terrible. Eventually ext4 became the norm, and the problem went away. Now we're going to deliberately insert logic to do a very similar kind of terrible thing because the kernel developers have decided that fsync() doesn't have to do what it says on the tin? I grant that there doesn't seem to be a better option, but I bet we're going to have a lot of really unhappy users if we do this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company