I wrote:
> Unfortunately I don't see any easy way to fix it. One approach would be
> to avoid reusing the relfilenodes until next checkpoint, but I don't see
> any nice place to keep track of OIDs that have been dropped since last
> checkpoint.

Ok, here's one idea:

Instead of deleting the file immediately on commit of DROP TABLE, the
file is truncated to release the space, but not unlink()ed, to avoid
reusing that relfilenode. The truncated file can be deleted after next

Now, how does checkpoint know what to delete? We can use the fsync
request mechanism for that. When a file is truncated, a new kind of
fsync request, a "deletion request", is sent to the bgwriter, which
collects all such requests to a list. Before checkpoint calculates new
RedoRecPtr, the list is swapped with an empty one, and after writing the
new checkpoint record, all the files that were in the list are deleted.

We would leak empty files on crashes, but we leak files on crashes
anyway, so that shouldn't be an issue. This scheme wouldn't require
catalog changes, so it would be suitable for backpatching.

Any better ideas?

Do we care enough about this to fix this? Enough to backpatch? The
probability of this happening is pretty small, but the consequences are
really bad, so my vote is "yes" and "yes".

  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not

Reply via email to