Hi,

On 2025-11-21 18:14:56 -0500, Peter Geoghegan wrote:
> On Fri, Nov 21, 2025 at 5:38 PM Andres Freund <[email protected]> wrote:
> > Another benfit is that it helps even more when there multiple queries 
> > running
> > concurrently - the high rate of lock/unlock on the buffer rather badly hurts
> > scalability.
> 
> I haven't noticed that effect myself. In fact, it seemed to be the
> other way around; it looked like it helped most with very low client
> count workloads.

It's possible that that effect is more visible on larger machines - I did test
that on a 2x 24cores/48 threads machine. I do see a smaller effect on a
2x10c/20t machine.


> It's possible that that had something to do with my hacky approach to
> validating the general idea of optimizing heapam buffer
> locking/avoiding repeated locking. This was a very rough prototype.

Heh, mine also was a dirty dirty hack. So....


> > Besides the locking overhead, it turns out that doing visibility checks
> > one-by-one is a good bit slower than doing so in batches (or for the whole
> > page). So that's another perf improvement this would enable.
> 
> Isn't that just what you automatically get by only locking once per
> contiguous group of TIDs that all point to the same heap page?

No, what I mean is to actually enter heapam_visibility.c once for a set of
tuples. That allows to do some expensive-ish stuff once per page, instead of
doing it repeatedly and allows for more out-of-order execution as the loop is
a lot tighter.  See here for my patch to do that for sequential scans:

https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr%40vw27225tkyar

Basically, instead of calling HeapTupleSatisfiesVisibility() individually for
each tuple, you call HeapTupleSatisfiesMVCCBatch() once for all the tuples
that you want to determine visibility for.


> > Yes, I think that's clearly required.  I think one nice bonus of such a 
> > change
> > is that it'd resolve one of the biggest existing layering violations around
> > tableam - namely that nodeIndexonlyscan.c does VM_ALL_VISIBLE() calls, which
> > it really has no business doing.
> 
> Right. One relevant artefact of that layering violation is the way
> that it forces index I/O prefetching (as implemented in the current
> draft patch) to cache visibility lookup info. But with an I/O
> prefetching design that puts exactly one place (namely the new table
> AM index scan implementation) in charge of everything, that is no
> longer necessary.

Yep.


> > I wonder if we could actually do part of the redesign in an even more
> > piecemeal fashion:
> >
> > 1) Move the responsibility for getting the next tid from the index into
> >    tableam, but do so by basically using index_getnext_tid().
> 
> I would prefer it if the new table AM interface was able to totally
> replace the existing one, for all types of index scans that currently
> use amgettuple. Individual table AMs would generally be expected to
> fully move over to the new interface in one go.

Right.


> That means that we'll need to have index_getnext_tid() support built
> into the heapam implementation of said new interface anway. We'll need
> it so that it is compatible with index AMs that still use amgettuple
> (i.e. that haven't switched over to amgetbatch). Because switching
> over to the amgetbatch interface isn't going to happen with every
> index AM in a single release -- that definitely isn't practical.

> Anyway, I don't see that much point in doing just step 1 in a single
> release. If we don't use amgetbatch in some fashion, then we risk
> committing something that solves the wrong problem.

I'm not actually suggesting that we do all these steps in separate releases or
such, just that we can get them committed individually. The nice thing about
my step 1) is that it would not require any indexam changes...



> I think that it's essential that the design of amgetbatch be able to
> accomodate reading leaf pages that are ahead of the current leaf page,
> to maintain heap I/O prefetch distance with certain workloads. But I
> don't think it has to do it in the first committed version.

Agreed and agreed.

Greetings,

Andres Freund


Reply via email to