On Wed, Jan 11, 2017 at 7:37 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Hmm. Yes, that is an entirely bogus use of isInterXact. I am > thinking about how to fix that with refcounts.
Cool. As I said, the way I'd introduce refcounts would not be very different from what I've already done -- there'd still be a strong adherence to the use of resource managers to clean-up, with that including exactly one particular backend doing the extra step of deletion. The refcount only changes which backend does that extra step in corner cases, which is conceptually a very minor change. >> I don't think it's right for buffile.c to know anything about file >> paths directly -- I'd say that that's a modularity violation. >> PathNameOpenFile() is called by very few callers at the moment, all of >> them very low level (e.g. md.c), but you're using it within buffile.c >> to open a path to the file that you obtain from shared memory > > Hmm. I'm not seeing the modularity violation. buffile.c uses > interfaces already exposed by fd.c to do this: OpenTemporaryFile, > then FilePathName to find the path, then PathNameOpenFile to open from > another process. I see that your approach instead has client code > provide more meta data so that things can be discovered, which may > well be a much better idea. Indeed, my point was that the metadata thing would IMV be better. buffile.c shouldn't need to know about file paths, etc. Instead, caller should pass BufFileImport()/BufFileUnify() simple private state sufficient for routine to discover all details itself, based on a deterministic scheme. In my tuplesort patch, that piece of state is: /* + * BufFileOp is an identifier for a particular parallel operation involving + * temporary files. Parallel temp file operations must be discoverable across + * processes based on these details. + * + * These fields should be set by BufFileGetIdent() within leader process. + * Identifier BufFileOp makes temp files from workers discoverable within + * leader. + */ +typedef struct BufFileOp +{ + /* + * leaderPid is leader process PID. + * + * tempFileIdent is an identifier for a particular temp file (or parallel + * temp file op) for the leader. Needed to distinguish multiple parallel + * temp file operations within a given leader process. + */ + int leaderPid; + long tempFileIdent; +} BufFileOp; + > Right, that is a problem. A refcount mode could fix that; virtual > file descriptors would be closed in every backend using the current > resource owner, and the files would be deleted when the last one turns > out the lights. Yeah. That's basically what the BufFile unification process can provide you with (or will, once I get around to implementing the refcount thing, which shouldn't be too hard). As already noted, I'll also want to make it defer creation of a leader-owned segment, unless and until that proves necessary, which it never will for hash join. Perhaps I should make superficial changes to unification in my patch to suit your work, like rename the field BufFileOp.leaderPid to BufFileOp.ownerPid, without actually changing any behaviors, except as noted in the last paragraph. Since you only require that backends be able to open up some other backend's temp file themselves for a short while, that gives you everything you need. You'll be doing unification in backends, and not just within the leader as in the tuplesort patch, I believe, but that's just fine. All that matters is that you present all data at once to a consuming backend via unification (since you treat temp file contents as immutable, this will be true for hash join, just as it is for tuplesort). There is a good argument against my making such a tweak, however, which is that maybe it's clearer to DBAs what's going on if temp file names have the leader PID in them for all operations. So, maybe BufFileOp.leaderPid isn't renamed to BufFileOp.ownerPid by me; instead, you always make it the leader pid, while at the same time having the leader dole out BufFileOp.tempFileIdent identifiers to each worker as needed (see how I generate BufFileOps for an idea of what I mean if it's not immediately clear). That's also an easy change, or at least will be once the refcount thing is added. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers