On Thu, Mar 23, 2017 at 12:35 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Thanks. I really appreciate your patience with the resource > management stuff I had failed to think through.
It's a surprisingly difficult problem, that almost requires prototyping just to explain. No need to apologize. This is the process by which many hard problems end up being solved. >> However, I notice that the place that this happens instead, >> PathNameDelete(), does not repeat the fd.c step of doing a final >> stat(), and using the stats for a pgstat_report_tempfile(). So, a new >> pgstat_report_tempfile() call is simply missing. However, the more >> fundamental issue in my mind is: How can you fix that? Where would it >> go if you had it? > > You're right. I may be missing something here (again), but it does > seem straightforward to implement because we always delete each file > that really exists exactly once (and sometimes we also try to delete > files that don't exist due to imprecise meta-data, but that isn't > harmful and we know when that turns out to be the case). ISTM that your patch now shares a quality with parallel tuplesort: You may now hold files open after an unlink() of the original link/path that they were opened using. As Robert pointed out when discussing parallel tuplesort earlier in the week, that comes with the risk, however small, that the vFD cache will close() the file out from under us during LRU maintenance, resulting in a subsequent open() (at the tail-end of the vFD's lifetime) that fails unexpectedly. It's probably fine to assume that we can sanely close() the file ourselves in fd.c error paths despite a concurrent unlink(), since we never operate on the link itself, and there probably isn't much pressure on each backend's vFD cache. But, is that good enough? I can't say, though I suspect that this particular risk is one that's best avoided. I haven't tested out how much of a problem this might be for your patch, but I do know that resowner.c will call your shared mem segment callback before closing any backend local vFDs, so I can't imagine how it could be that this risk doesn't exist. FWIW, I briefly entertained the idea that we could pin a vFD for just a moment, ensuring that the real FD could not be close()'d out by vfdcache LRU maintenance, which would fix this problem for parallel tuplesort, I suppose. That may not be workable for PHJ, because PHJ would probably need to hold on to such a "pin" for much longer, owing to the lack of any explicit "handover" phase. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers