Hi, On 2025-04-02 11:36:33 -0400, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > Looking at the size of BTScanOpaqueData I am less surprised: > > /* size: 27352, cachelines: 428, members: 17 */ > > allocating, zeroing and freeing 28kB of memory for every syscache miss, yea, > > that's gonna hurt. > > Ouch! I had no idea it had gotten that big. Yeah, we ought to > do something about that.
It got a bit bigger a few years back, in commit 0d861bbb702 Author: Peter Geoghegan <p...@bowt.ie> Date: 2020-02-26 13:05:30 -0800 Add deduplication to nbtree. Because the posting list is a lot more dense, more items can be stored on each page. Not that it was small before either: BTScanPosData currPos __attribute__((__aligned__(8))); /* 88 4128 */ /* --- cacheline 65 boundary (4160 bytes) was 56 bytes ago --- */ BTScanPosData markPos __attribute__((__aligned__(8))); /* 4216 4128 */ /* size: 8344, cachelines: 131, members: 16 */ /* sum members: 8334, holes: 3, sum holes: 10 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 4 */ /* last cacheline: 24 bytes */ } __attribute__((__aligned__(8))); But obviously ~3.2x can qualitatively change something. > > And/or perhaps we could could allocate BTScanOpaqueData.markPos as a whole > > only when mark/restore are used? > > That'd be an easy way of removing about half of the problem, but > 14kB is still too much. How badly do we need this items array? > Couldn't we just reference the on-page items? I think that'd require acquiring the buffer lock and/or pin more frequently. But I know very little about nbtree. I'd assume it's extremely rare for there to be this many items on a page. I'd guess that something like storing having BTScanPosData->items point to an in-line 4-16 BTScanPosItem items_inline[N] and dynamically allocate a full-length BTScanPosItem[MaxTIDsPerBTreePage] just in the cases it's needed. I'm a bit confused by the "MUST BE LAST" comment: BTScanPosItem items[MaxTIDsPerBTreePage]; /* MUST BE LAST */ Not clear why? Seems to be from rather long back: commit 09cb5c0e7d6 Author: Tom Lane <t...@sss.pgh.pa.us> Date: 2006-05-07 01:21:30 +0000 Rewrite btree index scans to work a page at a time in all cases (both Greetings, Andres Freund