On 2016-04-11 13:04:48 -0400, Robert Haas wrote:
> You're right, but I think that's more because I didn't say it
> correctly than because you haven't done something novel.
> DROP and
> relation truncation know about shared buffers, and they go clear
> blocks that that might be affected from it as part of the truncate
> operation, which means that no other backend will see them after they
> are gone. The lock makes sure that no other references can be added
> while we're busy removing any that are already there. So I think that
> there is currently an invariant that any block we are attempting to
> access should actually still exist.
Note that we're not actually accessing any blocks, we're just opening a
segment to get the associated file descriptor.
> It sounds like these references are sticking around in backend-private
> memory, which means they are neither protected by locks nor able to be
> cleared out on drop or truncate. I think that's a new thing, and a
> bit scary.
True. But how would you batch flush requests in a sorted manner
otherwise, without re-opening file descriptors otherwise? And that's
prety essential for performance.
I can think of a number of relatively easy ways to address this:
1) Just zap (or issue?) all pending flush requests when getting an
2) Do 1), but filter for the closed relnode
3) Actually handle the case of the last open segment not being
RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so.
I'm kind of inclined to do both 3) and 1).
> The possibly-saving grace here, I suppose, is that the references
> we're worried about are just being used to issue hints to the
> operating system.
> So I guess if we sent a hint on a wrong block or
> skip sending a hint altogether because of some failure, no harm done,
> as long as we don't error out.
Which the writeback code is careful not to do; afaics it's just the
"already open segment" issue making problems here.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: