On Sun, Jun 1, 2025 at 11:01 PM John Naylor <[email protected]> wrote:
>
> On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <[email protected]> wrote:
>
> > HEAD PATCHED DIFF
> > case-1: 3,021 ms 2.818 ms 93.29%
> > case-2: 5, 697 ms 5.545 ms 97.34%
> > case-3: 2,833 ms 2.790 ms 98.48%
> > case-4: 2,564 ms 2.279 ms 88.86%
> > case-5: 4,657 ms 4.706 ms 101.04%
>
> 1 and 4 look significant -- do the other cases have reproducible
> differences or is it just noise?
These results are the average of 3 times executions, so they are
reproducible differences.
>
> > Here are the summary of each attached patch:
> >
> > 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> > for multiple TIDs in one function call. If the given TIDs are sorted
> > (at least in block number), we can save radix tree lookup for the same
> > page entry.
>
> My only comment is that TidStoreIsMember() is now unused in core (or
> maybe just the tests?). It seems like we could just change the API for
> it rather than introduce a new function?
Good point, changed in the latest patch I posted[1].
>
> > 0003: Use batch TIDs lookup in btree index bulk-deletion.
> >
> > In patch 0003, we implement batch TID lookups for both each
> > deduplicated index tuple and remaining all regular index tuples, which
> > seems to be the most straightforward approach.
>
> Seems like a good approach. btvacuumpage() needs to sort if there is a
> mix of posting tuples and regular index tuples. Was that covered by
> any of the tests above?
Good point, I think this case was not covered. I've measured the
performance with the following queries:
# Case-6:
create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select (2 * c - 1) from generate_series(1, ${NROWS}) c;
insert into test select c from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where c < (${NROWS} * 0.3)::int
vacuum test;
And here is the result:
HEAD PATCHED DIFF
case-6: 3,320 ms 3,617 ms 108.94%
I'll consider how to deal with overheads of sorting TIDs.
> > While further
> > optimizations are possible, such as performing batch TID lookups for
> > all index tuples on a single page, these could introduce additional
> > overhead from sorting and re-sorting TIDs. Moreover, considering that
> > radix tree lookups are relatively inexpensive, the benefits of sorting
> > TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> > these potential optimizations warrant further evaluation to determine
> > their actual impact on performance.
>
> My guess is that always sorting by TID and than back by index tuple
> offset is too much overhead to be worth it, but I'm not sure.
Agreed. Given the above test results, it's unlikely always sorting the
array helps speedups.
Regards,
[1]
https://www.postgresql.org/message-id/CAD21AoAv55DhJ%2B19zaemx-_eO7z%2Bu4gtFmeADsMBFqtHhyUySQ%40mail.gmail.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com