On Tue, Jul 30, 2019 at 4:02 AM Andres Freund <and...@anarazel.de> wrote:
> I'm a bit worried about expanding the use of
> ReadBufferWithoutRelcache(). Not so much because of the relcache itself,
> but because it requires doing separate smgropen() calls. While not
> crazily expensive, it's also not free. Especially combined with closing
> all such relations at transaction end (c.f. AtEOXact_SMgr).
> I'm somewhat inclined to think that this requires a slightly bigger
> refactoring than done in this patch. Imo at the very least the smgr
> entries ought not to be unowned. But working towards not haven to
> re-open the smgr entry for every single trival request ought to be part
> of this too.

I spent some time trying to analyze this today and I agree with you
that there seems to be room for improvement here. When I first looked
at your comments, I wasn't too convinced, because access patterns that
skip around between undo logs seem like they may be fairly common.
Admittedly, there are cases where we want to read from just one undo
log over and over again, and it would be good to optimize those, but I
was initially a bit unconvinced that that there was a problem here
worth being concerned about. Then I realized that you would also
repeat the smgropen() if you read a single record that happens to be
split across two pages, which seems a little silly.

But then I realized that we're being a little silly even in the case
where we're reading a single undo record that is stored entirely on a
single page.  We are certainly going to need to look up the undo log,
but as things stand, we'll basically do it twice. For example, in the
write path, we'll call UndoLogAllocate() and it will look up an
UndoLogControl object for the undo log of interest, and then we'll
call ReadBufferWithoutRelcache() which will call smgropen() which will
do a hash table lookup to find the SMgrRelation associated with that
undo log. That's not a large cost, as you say, but it does seem like
it might be better to avoid having two different lookups in the same
commonly-used code path, each of which peeks into a different
backend-private data structure for information about the very same
undo log.

The obvious thing to do seems to be to have UndoLogControl objects own
SmgrRelations. That would be something of a novelty, since it looks
like currently only a Relation ever owns an SMgrRelation, but the smgr
infrastructure seems to have been set up in a generic way so as to
permit that sort of thing, so it seems like it should be workable.
Perhaps UndoLogAllocate() function could return a pointer to the
UndoLogControl object as well as UndoRecPtr. Then, there could be a
function UndoLogWrite(UndoLogControl *, UndoRecPtr, char *, Size).  On
the read side, instead of calling UndoRecPtrAssignRelFileNode, maybe
the undo log storage layer should provide a function that again
returns an UndoLogControl, and then we could have a matching function
UndoLogRead(UndoLogControl *, UndoRecPtr, char *, Size).

I think this kind of design would address your concerns about using
the unowned list, too, since the UndoLogControl objects would be
owning the SMgrRelations.  It took me a while to understand why you
were concerned about using the unowned list, so I'm going to repeat it
in my own words to make sure I've got it right, and also to possibly
help out anyone else who may also have had difficulty grokking your
concern.  If we have a bunch of short transactions each of which
accesses the same relation, the relcache entry will remain open and
the file won't get closed in between, but if we have a bunch of short
transactions each of which accesses the same undo log, the undo log
will be closed and reopened at the operating system level for each
individual transaction. That happens because when an SMgrRelation is
"owned," the owner takes care of closing it, and so can keep it open
across transactions, but when it's "unowned," it's automatically
closed during transaction cleanup.  And we should fix it, because
closing and reopening the same file for every transaction
unnecessarily might be expensive enough to matter, at least a little

How does all that sound?

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply via email to