Re: [HACKERS] Why copy_relation_data only use wal when WALarchiving is enabled

Florian G. Pflug Wed, 17 Oct 2007 06:09:35 -0700

Heikki Linnakangas wrote:

I wrote:

Unfortunately I don't see any easy way to fix it. One approach would be
to avoid reusing the relfilenodes until next checkpoint, but I don't see
any nice place to keep track of OIDs that have been dropped since last
checkpoint.


Ok, here's one idea:

Instead of deleting the file immediately on commit of DROP TABLE, the
file is truncated to release the space, but not unlink()ed, to avoid
reusing that relfilenode. The truncated file can be deleted after next
checkpoint.

Now, how does checkpoint know what to delete? We can use the fsync
request mechanism for that. When a file is truncated, a new kind of
fsync request, a "deletion request", is sent to the bgwriter, which
collects all such requests to a list. Before checkpoint calculates new
RedoRecPtr, the list is swapped with an empty one, and after writing the
new checkpoint record, all the files that were in the list are deleted.

We would leak empty files on crashes, but we leak files on crashes
anyway, so that shouldn't be an issue. This scheme wouldn't require
catalog changes, so it would be suitable for backpatching.

Any better ideas?

Couldn't we fix this by forcing a checkpoint before we commit the transactionthat created the new pg_class entry for the clustered table? Or rather, moregenerally, before committing a transaction that created a new non-temporaryrelfilenode but didn't WAL-log any subsequent inserts.

Thats of course a rather sledgehammer-like approach to this problem - but atleast for the backbranched the fix would be less intrusive...


regards, Florian Pflug


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [HACKERS] Why copy_relation_data only use wal when WALarchiving is enabled

Reply via email to