On Sun, Apr 8, 2018 at 02:16:07PM +1200, Thomas Munro wrote: > So, what can we actually do about this new Linux behaviour? > > Idea 1: > > * whenever you open a file, either tell the checkpointer so it can > open it too (and wait for it to tell you that it has done so, because > it's not safe to write() until then), or send it a copy of the file > descriptor via IPC (since duplicated file descriptors share the same > f_wb_err) > > * if the checkpointer can't take any more file descriptors (how would > that limit even work in the IPC case?), then it somehow needs to tell > you that so that you know that you're responsible for fsyncing that > file yourself, both on close (due to fd cache recycling) and also when > the checkpointer tells you to > > Maybe it could be made to work, but sheesh, that seems horrible. Is > there some simpler idea along these lines that could make sure that > fsync() is only ever called on file descriptors that were opened > before all unflushed writes, or file descriptors cloned from such file > descriptors? > > Idea 2: > > Give up, complain that this implementation is defective and > unworkable, both on POSIX-compliance grounds and on POLA grounds, and > campaign to get it fixed more fundamentally (actual details left to > the experts, no point in speculating here, but we've seen a few > approaches that work on other operating systems including keeping > buffers dirty and marking the whole filesystem broken/read-only). > > Idea 3: > > Give up on buffered IO and develop an O_SYNC | O_DIRECT based system ASAP.
Idea 4 would be for people to assume their database is corrupt if their server logs report any I/O error on the file systems Postgres uses. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +