[HACKERS] smgrsettransient mechanism is full of bugs

Tom Lane Sun, 14 Oct 2012 18:19:09 -0700

I got a bit suspicious of the transient-file mechanism introduced in
commit fba105b1099f4f5fa7283bb17cba6fed2baa8d0c after noticing that
CleanupTempFiles seemed to take an unreasonable amount of time in a
test case that didn't involve any temp files, cf
http://archives.postgresql.org/message-id/7110.1349392...@sss.pgh.pa.us

After further review, I have become convinced that in fact it's
completely broken and needs to be redone from scratch. The temp-file
marking at the fd.c level can easily get out of sync with the marking
at the smgr level, and that marking isn't too consistent with reality
either, which means we have all of the following problems:

(1) It can leak kernel descriptors, as reported in
http://archives.postgresql.org/message-id/b9bea448-978f-4a14-a088-3fd82214f...@pvv.ntnu.no
The triggering sequence for that appears to be:
* Transaction 1 does a blind write, sets FD_XACT_TRANSIENT.
* At transaction close, we close kernel FD and clear
FD_XACT_TRANSIENT in the VFD, but the VFD, the smgr relation,
and the md.c data structure are all still there.
* Transaction 2 does another blind write on same file. This
does not cause FD_XACT_TRANSIENT to get set because md.c
data structure already exists.
* Now we are carrying a "leaked" kernel FD that will never get
closed short of a CacheInvalidateSmgr message. Which doesn't
happen in a dropdb scenario. (That might be a bug in itself.)

(2) FlushBuffer will set the smgr-level transient flag even if we have
a relcache entry for the relation. (The fact that we're doing a blind
write to flush a dirty buffer does not prove that the rel is one we're
not interested in.) This can result in unnecessary forced closures of
kernel FDs, and it also results in too many scans of the VFD array,
because have_pending_fd_cleanup can get set unnecessarily.

(3) If the smgr-level flag gets cleared intra-transaction (ie, we did
a blind write and later started doing normal accesses to the same
relation), this fails to propagate to the VFD level, again resulting in
undesirable FD closures.

(4) After a blind write, we will close the kernel FD at transaction end,
but we don't flush the VFD array entry. This results in VFD array bloat
over time. The combination of this and (2) seems to explain the
performance problem I complained of above: there are too many VFD
searches done by CleanupTempFiles, and they have to pass over too many
useless entries.

I believe that we probably ought to revert this mechanism entirely, and
build a new implementation based on these concepts:

* An SMgrRelation is transient if and only if it doesn't have an
"owning" relcache entry. Keep a list of all such SmgrRelations, and
close them all at transaction end. (Obviously, an SMgrRelation gets
removed from the list if it acquires an owner mid-transaction.)

* There's no such concept as FD_XACT_TRANSIENT at the fd.c level.
Rather, we close and delete the VFD entry when told to by SmgrRelation
closure.

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] smgrsettransient mechanism is full of bugs

Reply via email to