Hi, On 2025-02-01 15:43:41 +0100, Ants Aasma wrote: > On Fri, Jan 31, 2025, 15:43 Andres Freund <and...@anarazel.de> wrote: > > > > Maybe it's a red herring though, but it looks pretty suspicious. > > > > It's unfortunately not too surprising - our buffer mapping table is a > > pretty > > big bottleneck. Both because a hash table is just not a good fit for the > > buffer mapping table due to the lack of locality and because dynahash is > > really poor hash table implementation. > > > > I measured similar things when looking at apply throughput recently. For > in-cache workloads buffer lookup and locking was about half of the load. > > One other direction is to extract more memory concurrency. Prefetcher could > batch multiple lookups together so CPU OoO execution has a chance to fire > off multiple memory accesses at the same time.
I think at the moment we have a *hilariously* cache-inefficient buffer lookup, that's the first thing to address. A hash table for buffer mapping lookups imo is a bad idea, due to loosing all locality in a workload that exhibits a *lot* of locality. But furthermore, dynahash.c is very far from a cache efficient hashtable implementation. The other aspect is that in many workloads we'll look up a small set of buffers over and over, which a) wastes cycles b) wastes cache space for stuff that could be elided much more efficiently. We also do a lot of hash lookups for smgr, because we don't have any cross-record caching infrastructure for that. > The other direction is to split off WAL decoding, buffer lookup and maybe > even pinning to a separate process from the main redo loop. Maybe, but I think we're rather far away from those things being the most productive thing to tackle. Greetings, Andres Freund