Re: Test timings are increasing too fast for cfbot

2026-03-26 Thread Andres Freund
On 2026-03-26 11:09:36 +1300, David Rowley wrote:
> On Wed, 25 Mar 2026 at 16:15, Andres Freund  wrote:
> > Code wise, the most immediately noticeable things are
> 
> > 3) verify_compact_attribute(), pretty spread around
> 
> We do now have TupleDescFinalize(), where those could be checked just
> once rather than on each call to TupleDescCompactAttr(). That's not
> quite as watertight a guarantee as someone could change the
> FormData_pg_attribute after TupleDescFinalize(). Just doing it in
> TupleDescFinalize() would at least still catch the places where people
> forget to call populate_compact_attribute() before
> TupleDescFinalize().

Maybe verify_compact_attribute() could just do an assert comparison between
the underlying non-compact attribute and the compact one? Or even just an
assert checking if the compact attribute is initialized, with the full
checking happening in TupleDescFinalize(), as you suggest?


> I would have expected this to be a little less overhead now since
> d8a859d22 removed the calls to TupleDescCompactAttr() in the main
> deforming routine. Maybe I should just make that change in the other
> deformers...?

Might be worth it.


> Do you have an idea of which callers of verify_compact_attribute() are
> causing the most overhead?

I'm not sure how much to believe the profile when the costs are as distributed
as they are here. But according to the profile it's
- heap_form_tuple
- heap_form_minimal_tuple
- index_getattr
- nocachegetattr

Greetings,

Andres Freund




Re: Test timings are increasing too fast for cfbot

2026-03-26 Thread Andres Freund
Hi,

On 2026-03-25 12:36:12 +0800, Julien Rouhaud wrote:
> On Tue, Mar 24, 2026 at 11:15:32PM -0400, Andres Freund wrote:
> >
> > A surprisingly large source of this is psql internal queries.  I first ran 
> > the
> > regression tests with pg_stat_statements activated [1] and noticed that a 
> > lot
> > of the queries, by exec and or plan time, are psql queries.
> >
> > [1] which annoyingly generates a bunch of test failures due to "Query
> > Identifier" getting added to tests
> 
> FWIW `compute_query_id = regress` is supposed to avoid that problem.

Hm. I guess that should work. Thanks!

It's pretty annoying that one has to manually specify it to keep the tests
passing.  I wonder if we should make pg_regress specify it to avoid that.

Greetings,

Andres Freund




Re: Test timings are increasing too fast for cfbot

2026-03-25 Thread David Rowley
On Wed, 25 Mar 2026 at 16:15, Andres Freund  wrote:
> Code wise, the most immediately noticeable things are

> 3) verify_compact_attribute(), pretty spread around

We do now have TupleDescFinalize(), where those could be checked just
once rather than on each call to TupleDescCompactAttr(). That's not
quite as watertight a guarantee as someone could change the
FormData_pg_attribute after TupleDescFinalize(). Just doing it in
TupleDescFinalize() would at least still catch the places where people
forget to call populate_compact_attribute() before
TupleDescFinalize().

I would have expected this to be a little less overhead now since
d8a859d22 removed the calls to TupleDescCompactAttr() in the main
deforming routine. Maybe I should just make that change in the other
deformers...? Do you have an idea of which callers of
verify_compact_attribute() are causing the most overhead?

David




Re: Test timings are increasing too fast for cfbot

2026-03-25 Thread Andres Freund
Hi,

On 2026-03-25 15:38:01 +0100, Tomas Vondra wrote:
> On 3/25/26 04:15, Andres Freund wrote:
> > ...
> > 
> > The slowest test is stats_ext.sql - Not surprising, it does sequential scans
> > of tables with ~1000-1 rows over and over again.  I don't see why it has
> > to do that with as many rows as it does.
> > 
> 
> IIRC we needed to use non-trivial amounts of data to ensure building the
> right right type of statistics (e.g. no MCV). But I understand it can be
> annoyingly expensive, so I'll try to make it cheaper.

Great.  It might also be worth checking if any of the query conditions can be
made cheaper without actually changing what they do test.


> >  ...
> > 2) AssertCheckRanges() (only in the brin test, but there a very large 
> > portion
> >of the runtime)
> 
> True. It is a very comprehensive validation of the ranges, and it was
> very useful - particularly during development. But I'll try to make it
> more targeted at the stuff actually changed / called less often.

Makes sense.  There might also be some changs that make it faster without
loosing any coverage. E.g. not using FunctionCall2Coll() - which initializes
stuff on every call - but doing InitFunctionCallInfoData() once and then
update arguments + FunctionCallInvoke() for each call.


> Both changes will require time (so that we don't lose test coverage),
> but I assume it's OK if that happens sometime after the feature freeze.

Yea. We're not running out of credits tomorrow or something like that.

Greetings,

Andres Freund




Re: Test timings are increasing too fast for cfbot

2026-03-25 Thread Tomas Vondra
On 3/25/26 04:15, Andres Freund wrote:
> ...
> 
> The slowest test is stats_ext.sql - Not surprising, it does sequential scans
> of tables with ~1000-1 rows over and over again.  I don't see why it has
> to do that with as many rows as it does.
> 

IIRC we needed to use non-trivial amounts of data to ensure building the
right right type of statistics (e.g. no MCV). But I understand it can be
annoyingly expensive, so I'll try to make it cheaper.

>  ...
> 2) AssertCheckRanges() (only in the brin test, but there a very large portion
>of the runtime)

True. It is a very comprehensive validation of the ranges, and it was
very useful - particularly during development. But I'll try to make it
more targeted at the stuff actually changed / called less often.

Both changes will require time (so that we don't lose test coverage),
but I assume it's OK if that happens sometime after the feature freeze.


regards

-- 
Tomas Vondra





Re: Test timings are increasing too fast for cfbot

2026-03-24 Thread Julien Rouhaud
Hi,

On Tue, Mar 24, 2026 at 11:15:32PM -0400, Andres Freund wrote:
>
> A surprisingly large source of this is psql internal queries.  I first ran the
> regression tests with pg_stat_statements activated [1] and noticed that a lot
> of the queries, by exec and or plan time, are psql queries.
>
> [1] which annoyingly generates a bunch of test failures due to "Query
> Identifier" getting added to tests

FWIW `compute_query_id = regress` is supposed to avoid that problem.




Re: Test timings are increasing too fast for cfbot

2026-03-24 Thread Andres Freund
Hi,

On 2026-03-24 17:30:46 -0400, Andres Freund wrote:
> Due to the number of times we run the main regression tests, they have a
> particularly large effect on test resources.
>
>  time cycles   syscalls
> 17   37.13239.2T   1.573M
> 18   44.27  295.7T   1.715M
> master   48.19  323.5T   1.854M

A surprisingly large source of this is psql internal queries.  I first ran the
regression tests with pg_stat_statements activated [1] and noticed that a lot
of the queries, by exec and or plan time, are psql queries.

I patched psql to add "-- psql internal\n" to every internal query and then
used log_min_duration_statement=0 to see how much of the time is spent in psql
- ~13% of the reported duration of all queries.


A lot of that is because describeTbleDetails() ends up with a sequential scan:

regression[1020428][1]=# EXPLAIN ANALYZE SELECT c.oid,
  n.nspname,
  c.relname
FROM pg_catalog.pg_class c
 LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(pg_class)$' COLLATE pg_catalog.default
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
┌──┐
│QUERY PLAN 
   │
├──┤
│ Sort  (cost=324.79..324.80 rows=1 width=132) (actual time=7.192..7.194 
rows=1.00 loops=1)│
│   Sort Key: n.nspname, c.relname  
   │
│   Sort Method: quicksort  Memory: 25kB
   │
│   Buffers: shared hit=234 
   │
│   ->  Nested Loop Left Join  (cost=0.00..324.78 rows=1 width=132) (actual 
time=2.489..7.121 rows=1.00 loops=1)   │
│ Join Filter: (n.oid = c.relnamespace) 
   │
│ Buffers: shared hit=231   
   │
│ ->  Seq Scan on pg_class c  (cost=0.00..313.79 rows=1 width=72) 
(actual time=2.473..7.105 rows=1.00 loops=1) │
│   Filter: ((relname ~ '^(pg_class)$'::text) AND 
pg_table_is_visible(oid))│
│   Rows Removed by Filter: 2260
   │
│   Buffers: shared hit=229 
   │
│ ->  Seq Scan on pg_namespace n  (cost=0.00..8.22 rows=222 width=68) 
(actual time=0.011..0.011 rows=1.00 loops=1) │
│   Buffers: shared hit=2   
   │
│ Planning: 
   │
│   Buffers: shared hit=6   
   │
│ Planning Time: 0.520 ms   
   │
│ Execution Time: 7.265 ms  
   │
└──┘
(17 rows)

It seems decidedly not optimal that "\d tablename", without any patterns, ends
up doing a seqscan.  That's bad enough in the regression database, but there
are many PG instances with many many entries in pg_class.

I don't think this was always the case?

If I remove the COLLATE pg_catalog.default, a sane plan is chosen. That's
obviously not the right fix, but seemed interesting enough to mention.


That alone seems makes a surprising 4.5% improvement in an unoptimized -O0
build.


Another psql query that stands out is \d's query for publications. Not because
of runtime, but because of the planning cost. Nuking that (obviously not
correct), is worth another 3% of test time (and some test failues).


I've not analyzed the psql queries any further, but I'd bet that the rest also
do add up.



The most expensive query in the entire regression tests is:
  DO $$
  BEGIN
  FOR r IN 1..1350 LOOP
  DELETE FROM dedup_unique_test_table;
  INSERT INTO dedup_unique_test_table SELECT 1;
  END LOOP;
  END$$;
which makes sense, as that's pretty clearly O(N^2).



There are a surprisingly large number of temporary file creations in the
regression tests. A