On Tue, Mar 22, 2022 at 4:06 PM Andres Freund <and...@anarazel.de> wrote: > Are you thinking of just moving the setup stuff in nbtree (presumably parts of > _bt_first() / _bt_preprocess_keys()) or also stuff in > ExecIndexBuildScanKeys()? > > The latter does show up a bit more heavily in profiles than nbtree specific > setup, and given that it's generic executor type stuff, seems even more > amenable to being moved to plan time.
When I was working on the patch series that became the nbtree Postgres 12 work, this came up. At one point I discovered that using palloc0() for the insertion scankey in _bt_first() was a big problem with nested loop joins -- it became a really noticeable bottleneck with one of my test cases. I independently discovered what Tom must have figured out back in 2005, when he committed d961a56899. This led to my making the new insertion scan key structure (BTScanInsertData) not use dynamic allocation. So _bt_first() is definitely performance critical for certain types of queries. We could get rid of dynamic allocations for BTStackData in _bt_first(), perhaps. The problem is that there is no simple, reasonable proof of the maximum height on a B-tree, even though a B-Tree with more than 7 or 8 levels seems extraordinarily unlikely. You could also invent a slow path (maybe do what we do in _bt_insert_parent() in the event of a concurrent root page split/NULL stack), but that runs into the problem of being awkward to test, and pretty ugly. It's doable, but I wouldn't do it unless there was a pretty noticeable payoff. -- Peter Geoghegan