On 2017-08-14 14:40:46 -0400, Tom Lane wrote:
> The core problem with zapping non-temp table files is that you can't
> do that unless you're sure you have consistent, up-to-date pg_class
> data that nobody else is busy adding to.  It's hard to see an external
> application being able to do that safely.  You certainly can't do it
> at the point in the postmaster startup cycle where we currently do
> the other things --- for those, we rely only on filesystem naming
> conventions to identify what to zap.

I think there are some possibilities to close the gap here. We could
e.g. have <relfilenode>.delete_on_crash marker files that get installed
when creating a new persistent relfilenode. If we set up things so they
get deleted post commit, but inside the critical section, we could rely
on them being present in case of crash, but consistently removed during
WAL replay. At the end of recovery, iterate over the whole datadir and
nuke all relations with marker files present.

I first thought that'd cost an additional fsync per relation
created. But I think we actually can delay that to a pre-commit phase,
if we have XLOG_SMGR_CREATE create those markers via a flag, and fsync
them just before checkpoint (via the usual delayed fsync mechanism).
That'd still require an XLogFlush(), but that seems hard to avoid unless
we just don't create relations on FS level until buffers are
evicted and/or BufferSync().

Alternatively we could do something without marker files, with some
added complexity: Keep track of all "uncommitted new files" in memory,
and log them every checkpoint. Commit/abort records clear elements of
that list. Since we always start replay at the beginning of a
checkpoint, we'd always reach a moment with such an up2date list of
pending-action files before reaching end-of-recovery. At end-of-recovery
we can delete all unconfirmed files.  To avoid out-of-memory due to too
many tracked relations, we'd possibly still have to have marker files...



Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to