Tom Lane wrote:
"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
It might be even worse - I'm not sure that a rename is an atomic operation
on most filesystems.
rename(2) is specified to be atomic by POSIX, but relinking a file into
a different directory can hardly be --- it's not even provided as a
single kernel call, is it?
I'd have thought that they only guarantee that if the new name already
exists it's atomically replaced. But I might be wrong....
And there's still the problem that changing the filename on-the-fly is
going to break tons of low-level stuff, most of which is not supposed to
know about transactions at all, notably bgwriter.
Good point - I thought that we wouldn't have to care about this because
we could close the relation before renaming in the committing backend
and be done with it, because other backends won't see the new file
before we update the clog. But you're right, bgwriter is a problem
and one not easily solved...
So that rename-on-commit idea seems to be quite dead...
What I was thinking about was a "flag file" separate from the data file
itself, a bit like what we use for archiver signaling. If nnnn is the
new data file, then "touch nnnn.new" to mark the file as needing to be
deleted on restart. Remove these files just *before* commit. This
leaves you with a narrow window between removing the flag file and
actually committing, but there's no risk of having to PANIC --- if the
remove fails, you just abort the transaction.
Hm.. we could call the file "nnn.xid.new", and delete it after the commit,
silently ignoring any failures. During both database-wide VACUUM and
after recovery we'd remove any leftover *.xid.new files, but only
if the xid is marked committed in the clog. After that cleanup step,
we'd delete any files which still have an associated flag file.
Processing those nnn.xid.new files during VACUUM is just needed to
avoid any problems because of xid wraparound - it could maybe
be replaced by maybe naming the file nnn.epoch.xid.new
However, this has nonzero overhead compared to the current behavior.
I'm still dubious that we have a problem that needs solving ...
I agree that file leakage is not a critical problem - if it were, they'd
be much more complaints...
But it's still something that a postgres DBA has to be aware of, because
it might bite you quite badly. Since IMHO admin friendlyness is one of
the strengths of postgresql, removing the possibility of leakage would be
nice in the long term.
Nothing that needs any rushing, though - and nothing that we'd want to pay
for in terms of performance.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings