On Tue, Jul 30, 2019 at 4:02 AM Andres Freund <and...@anarazel.de> wrote: > I'm a bit worried about expanding the use of > ReadBufferWithoutRelcache(). Not so much because of the relcache itself, > but because it requires doing separate smgropen() calls. While not > crazily expensive, it's also not free. Especially combined with closing > all such relations at transaction end (c.f. AtEOXact_SMgr). > > I'm somewhat inclined to think that this requires a slightly bigger > refactoring than done in this patch. Imo at the very least the smgr > entries ought not to be unowned. But working towards not haven to > re-open the smgr entry for every single trival request ought to be part > of this too.
I spent some time trying to analyze this today and I agree with you that there seems to be room for improvement here. When I first looked at your comments, I wasn't too convinced, because access patterns that skip around between undo logs seem like they may be fairly common. Admittedly, there are cases where we want to read from just one undo log over and over again, and it would be good to optimize those, but I was initially a bit unconvinced that that there was a problem here worth being concerned about. Then I realized that you would also repeat the smgropen() if you read a single record that happens to be split across two pages, which seems a little silly. But then I realized that we're being a little silly even in the case where we're reading a single undo record that is stored entirely on a single page. We are certainly going to need to look up the undo log, but as things stand, we'll basically do it twice. For example, in the write path, we'll call UndoLogAllocate() and it will look up an UndoLogControl object for the undo log of interest, and then we'll call ReadBufferWithoutRelcache() which will call smgropen() which will do a hash table lookup to find the SMgrRelation associated with that undo log. That's not a large cost, as you say, but it does seem like it might be better to avoid having two different lookups in the same commonly-used code path, each of which peeks into a different backend-private data structure for information about the very same undo log. The obvious thing to do seems to be to have UndoLogControl objects own SmgrRelations. That would be something of a novelty, since it looks like currently only a Relation ever owns an SMgrRelation, but the smgr infrastructure seems to have been set up in a generic way so as to permit that sort of thing, so it seems like it should be workable. Perhaps UndoLogAllocate() function could return a pointer to the UndoLogControl object as well as UndoRecPtr. Then, there could be a function UndoLogWrite(UndoLogControl *, UndoRecPtr, char *, Size). On the read side, instead of calling UndoRecPtrAssignRelFileNode, maybe the undo log storage layer should provide a function that again returns an UndoLogControl, and then we could have a matching function UndoLogRead(UndoLogControl *, UndoRecPtr, char *, Size). I think this kind of design would address your concerns about using the unowned list, too, since the UndoLogControl objects would be owning the SMgrRelations. It took me a while to understand why you were concerned about using the unowned list, so I'm going to repeat it in my own words to make sure I've got it right, and also to possibly help out anyone else who may also have had difficulty grokking your concern. If we have a bunch of short transactions each of which accesses the same relation, the relcache entry will remain open and the file won't get closed in between, but if we have a bunch of short transactions each of which accesses the same undo log, the undo log will be closed and reopened at the operating system level for each individual transaction. That happens because when an SMgrRelation is "owned," the owner takes care of closing it, and so can keep it open across transactions, but when it's "unowned," it's automatically closed during transaction cleanup. And we should fix it, because closing and reopening the same file for every transaction unnecessarily might be expensive enough to matter, at least a little bit. How does all that sound? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company