I got a bit suspicious of the transient-file mechanism introduced in commit fba105b1099f4f5fa7283bb17cba6fed2baa8d0c after noticing that CleanupTempFiles seemed to take an unreasonable amount of time in a test case that didn't involve any temp files, cf http://archives.postgresql.org/message-id/7110.1349392...@sss.pgh.pa.us
After further review, I have become convinced that in fact it's completely broken and needs to be redone from scratch. The temp-file marking at the fd.c level can easily get out of sync with the marking at the smgr level, and that marking isn't too consistent with reality either, which means we have all of the following problems: (1) It can leak kernel descriptors, as reported in http://archives.postgresql.org/message-id/b9bea448-978f-4a14-a088-3fd82214f...@pvv.ntnu.no The triggering sequence for that appears to be: * Transaction 1 does a blind write, sets FD_XACT_TRANSIENT. * At transaction close, we close kernel FD and clear FD_XACT_TRANSIENT in the VFD, but the VFD, the smgr relation, and the md.c data structure are all still there. * Transaction 2 does another blind write on same file. This does not cause FD_XACT_TRANSIENT to get set because md.c data structure already exists. * Now we are carrying a "leaked" kernel FD that will never get closed short of a CacheInvalidateSmgr message. Which doesn't happen in a dropdb scenario. (That might be a bug in itself.) (2) FlushBuffer will set the smgr-level transient flag even if we have a relcache entry for the relation. (The fact that we're doing a blind write to flush a dirty buffer does not prove that the rel is one we're not interested in.) This can result in unnecessary forced closures of kernel FDs, and it also results in too many scans of the VFD array, because have_pending_fd_cleanup can get set unnecessarily. (3) If the smgr-level flag gets cleared intra-transaction (ie, we did a blind write and later started doing normal accesses to the same relation), this fails to propagate to the VFD level, again resulting in undesirable FD closures. (4) After a blind write, we will close the kernel FD at transaction end, but we don't flush the VFD array entry. This results in VFD array bloat over time. The combination of this and (2) seems to explain the performance problem I complained of above: there are too many VFD searches done by CleanupTempFiles, and they have to pass over too many useless entries. I believe that we probably ought to revert this mechanism entirely, and build a new implementation based on these concepts: * An SMgrRelation is transient if and only if it doesn't have an "owning" relcache entry. Keep a list of all such SmgrRelations, and close them all at transaction end. (Obviously, an SMgrRelation gets removed from the list if it acquires an owner mid-transaction.) * There's no such concept as FD_XACT_TRANSIENT at the fd.c level. Rather, we close and delete the VFD entry when told to by SmgrRelation closure. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers