Hi,

On Tue, Jun 2, 2026 at 8:38 PM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> On 2026-06-01 12:01:58 +0200, Jakub Wartak wrote:
> > So I've spent half of day on trying to see what makes the tests so slow at
> > least in my case. I can also confirm %CPU combined (with high 33% sys).
>
> Was this locally on your machine?  I assume that's without enabling
> sanitizers?

Yup.

> In CI the bottleneck clearly is CPU at the moment, due to the relatively now
> number of cores.
>
> To reduce IO, one pretty significant thing we can do is to reduce the segment
> size used during tests. Creating lots of 16MB segments when most of them are
> only very partially used isn't free.

Right, saw that, nice.

> > 0. baseline was ~71s (stuff already hot)
> > 1a. down to 64s with dirtywriteback tune (and mostly to avoid NVMe/SSD wear)
> > 1b. ~65s with tmpfs, so I've left using dirtywriteback sysctls:
> >     sudo mount -t tmpfs -o size=4G,uid=XXX,mode=755  tmpfs build/tmp_install
> >     sudo mount -t tmpfs -o size=16G,uid=XXX,mode=755 tmpfs /build/testrun
>
> I don't think we should do that, real FS behaviour is something we do IMO want
> to test.

Ack.

> >      1,100   pg_upgrade
> >        896   isolation
> >        694   pg_dump
> >        682   pg_basebackup
> >
> >     Fixing above subscription to ~5000 conns did not gain much (well it 
> > saved
> >     5% of runtime 43s -> 41s). It's literally 10k lines of
> >     s/$node_subscriber->safe_psql/sub_bg->query_safe/g across dozens of 
> > files
> >     in src/test/subscription/t/). Too big for review and I'm not sharing as
> >     it could contain errors.
>
> Did you test the effect of those changes on windows (via CI)? I'd expect that
> big a reduction to have a substantially bigger effect there.

No I did not and I've wiped the changes already, It was just probe for
any simple
quick wins...

> > 5. Spotted that we do plenty of initdb and cached-initdb (cp), so I had idea
> >    about XFS's cp reflinks=always in build/, but I couldn't do that without
> >    /dev/loop, so apparently XFS (reflink=1) vs ext4(reflink=0) halves number
> >    of writes while even still on /dev/loop device, but that somehow
> >    does not directly contribute to duration of the test (well we are
> >    bottlenecked on CPU anyway, so this is just smarter? way of avoiding I/O;
> >    maybe with cold-caches and on real VMs running with XFS would be faster)
> >
> >    +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> >    @@ -687,7 +687,13 @@ sub init
> >                   }
> >                   else
> >                   {
> >    -                       @copycmd = qw(cp -RPp);
> >    +                       @copycmd = qw(cp --reflink=always -RPp);
>
> Afaict cp uses reflinks automatically by default, if the filesystem supports
> it.  On CI it's not supported due to ext4, but locally it seems to work for
> me.

Yeah it does, I was just wanted to be double-sure, but then realized with CI
we are on overlay fs on top of host's ext4 :( It's a pitty because that cp could
be instant (even CREATE DATABASE with file_extend_method=clone) as even with
--wal-segsize=1 empty cluster takes ~32MB (3x8MB), but even rough estimates
of even cached initdb calls give huge numbers:

$ grep -r -A 5 'PostgreSQL::Test::Cluster->new' src contrib | grep -Po
'\->init[a-z_]*' | sort | uniq -c
    341 ->init
     98 ->init_from_backup

so that's like 400 * 32MB = 12800 MB? But I get the point of using real fs,
it's just that we should have some option of using throwaway filesystems
(maybe we even do, but on own/dedicated runners).

-J.


Reply via email to