Robert Haas <robertmh...@gmail.com> writes: > I notice that RelationCreateStorage() creates the main fork on disk > before writing (let alone flushing) WAL. So if PG gets killed at that > point, we end up with an orphaned file on disk. I think that we could > even extend the relation a few times before WAL gets written, so I > don't even think it's necessarily a zero-size file. We could perhaps > avoid this by writing and flushing a WAL record that includes the > creating XID before touching the disk; when we replay the record, we > create the file but then delete it if the XID fails to commit before > recovery ends. But I guess maybe our feeling is that it's just not > worth taking a performance hit for this?
That design is intentional. If the file create fails, and you've already written a WAL record that says you created it, you are flat out screwed. You can't even PANIC --- if you do, then the replay of the WAL record will likely fail and PANIC again, leaving the database dead in the water. Orphaned files, in contrast, are completely non-dangerous --- the worst they can do is waste a little bit of disk space. That's a cheap price to pay for not having an unrecoverable database after a create failure. This is essentially the same reason why CREATE DATABASE and related commands xlog directory copy operations only after completing them. That potentially wastes much more than a few blocks; but it's still non-dangerous, and far safer than the alternative. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers