On Thu, Aug 27, 2020 at 6:15 AM Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > > --4.90%--smgropen > > |--2.86%--ReadBufferWithoutRelcache > > Looking at an earlier report of this problem I was thinking whether it'd > make sense to replace SMgrRelationHash with a simplehash table; I have a > half-written patch for that, but I haven't completed that work. > However, in the older profile things were looking different, as > hash_search_with_hash_value was taking 35.25%, and smgropen was 33.74% > of it. BufTableLookup was also there but only 1.51%. So I'm not so > sure now that that'll pay off as clearly as I had hoped.
Right, my hypothesis requires an uncacheably large buffer mapping table, and I think smgropen() needs a different explanation because it's not expected to be as large or as random, at least not with a pgbench workload. I think the reasons for a profile with a smgropen() showing up so high, and in particular higher than BufTableLookup(), must be: 1. We call smgropen() twice for every call to BufTableLookup(). Once in XLogReadBufferExtended(), and then again in ReadBufferWithoutRelcache(). 2. We also call it for every block forced out of the buffer pool, and in recovery that has to be done by the recovery loop. 3. We also call it for every block in the buffer pool during the end-of-recovery checkpoint. Not sure but the last two might perform worse due to proximity to interleaving pwrite() system calls (just a thought, not investigated). In any case, I'm going to propose we move those things out of the recovery loop, in a new thread.