On 2/13/17 12:10 AM, Michael Paquier wrote: > On Tue, Jan 31, 2017 at 11:07 AM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> On Mon, Jan 30, 2017 at 10:52 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote: >>> If that can happen, don't we have the same problem in many other places? >>> Like, all the SLRUs? They don't fsync the directory either. >> >> Right, pg_commit_ts and pg_clog enter in this category. > > Implemented as attached. > >>> Is unlink() guaranteed to be durable, without fsyncing the directory? If >>> not, then we need to fsync() the directory even if there are no files in it >>> at the moment, because some might've been removed earlier in the checkpoint >>> cycle. >> >> Hm... I am not an expert in file systems. At least on ext4 I can see >> that unlink() is atomic, but not durable. So if an unlink() is >> followed by a power failure, the previously unlinked file could be >> here if the parent directory is not fsync'd. > > So I have been doing more work on this patch, with the following things done: > - Flush pg_clog, pg_commit_ts and pg_twophase at checkpoint phase to > ensure their durability. > - Create a durable_unlink() routine to give a way to perform a durable > file removal. > I am now counting 111 calls to unlink() in the backend code, and > looking at all of them most look fine with plain unlink() if they are > not made durable as they work on temporary files (see timeline.c for > example), with some exceptions: > - In pg_stop_backup, the old backup_label and tablespace_map removal > should be durable to avoid putting the system in a wrong state after > power loss. Other calls of unlink() are followed by durable_rename so > they are fine if let as such. > - Removal of old WAL segments should be durable as well. There is > already an effort to rename them durably in case of a segment > recycled. In case of a power loss, a file that should have been > removed could remain in pg_xlog. > > Looking around, I have bumped as well on the following bug report for > SQlite which is in the same category of things: > http://sqlite.1065341.n5.nabble.com/Potential-bug-in-crash-recovery-code-unlink-and-friends-are-not-synchronous-td68885.html > Scary to see that in this case durability can be a problem at > transaction commit...
This patch applies cleanly and compiles at cccbdde. Ashutosh, do you know when you'll have a chance to review? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers