Re: Test timings are increasing too fast for cfbot

Andres Freund Tue, 24 Mar 2026 20:15:48 -0700

Hi,

On 2026-03-24 17:30:46 -0400, Andres Freund wrote:
> Due to the number of times we run the main regression tests, they have a
> particularly large effect on test resources.
>
>              time     cycles   syscalls
> 17           37.13    239.2T   1.573M
> 18           44.27      295.7T   1.715M
> master       48.19      323.5T   1.854M


A surprisingly large source of this is psql internal queries.  I first ran the
regression tests with pg_stat_statements activated [1] and noticed that a lot
of the queries, by exec and or plan time, are psql queries.

I patched psql to add "-- psql internal\n" to every internal query and then
used log_min_duration_statement=0 to see how much of the time is spent in psql
- ~13% of the reported duration of all queries.


A lot of that is because describeTbleDetails() ends up with a sequential scan:

regression[1020428][1]=# EXPLAIN ANALYZE SELECT c.oid,
  n.nspname,
  c.relname
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(pg_class)$' COLLATE pg_catalog.default
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                        QUERY PLAN             
                                           │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort  (cost=324.79..324.80 rows=1 width=132) (actual time=7.192..7.194 
rows=1.00 loops=1)                                │
│   Sort Key: n.nspname, c.relname                                              
                                           │
│   Sort Method: quicksort  Memory: 25kB                                        
                                           │
│   Buffers: shared hit=234                                                     
                                           │
│   ->  Nested Loop Left Join  (cost=0.00..324.78 rows=1 width=132) (actual 
time=2.489..7.121 rows=1.00 loops=1)           │
│         Join Filter: (n.oid = c.relnamespace)                                 
                                           │
│         Buffers: shared hit=231                                               
                                           │
│         ->  Seq Scan on pg_class c  (cost=0.00..313.79 rows=1 width=72) 
(actual time=2.473..7.105 rows=1.00 loops=1)     │
│               Filter: ((relname ~ '^(pg_class)$'::text) AND 
pg_table_is_visible(oid))                                    │
│               Rows Removed by Filter: 2260                                    
                                           │
│               Buffers: shared hit=229                                         
                                           │
│         ->  Seq Scan on pg_namespace n  (cost=0.00..8.22 rows=222 width=68) 
(actual time=0.011..0.011 rows=1.00 loops=1) │
│               Buffers: shared hit=2                                           
                                           │
│ Planning:                                                                     
                                           │
│   Buffers: shared hit=6                                                       
                                           │
│ Planning Time: 0.520 ms                                                       
                                           │
│ Execution Time: 7.265 ms                                                      
                                           │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(17 rows)

It seems decidedly not optimal that "\d tablename", without any patterns, ends
up doing a seqscan.  That's bad enough in the regression database, but there
are many PG instances with many many entries in pg_class.

I don't think this was always the case?

If I remove the COLLATE pg_catalog.default, a sane plan is chosen. That's
obviously not the right fix, but seemed interesting enough to mention.


That alone seems makes a surprising 4.5% improvement in an unoptimized -O0
build.


Another psql query that stands out is \d's query for publications. Not because
of runtime, but because of the planning cost. Nuking that (obviously not
correct), is worth another 3% of test time (and some test failues).


I've not analyzed the psql queries any further, but I'd bet that the rest also
do add up.



The most expensive query in the entire regression tests is:
  DO $$
  BEGIN
      FOR r IN 1..1350 LOOP
          DELETE FROM dedup_unique_test_table;
          INSERT INTO dedup_unique_test_table SELECT 1;
      END LOOP;
  END$$;
which makes sense, as that's pretty clearly O(N^2).



There are a surprisingly large number of temporary file creations in the
regression tests. A lot of them due to shared file sets in the context of
parallelism.  I wonder whether that is a partial cause for the slowness on
windows, which has very slow metadata operations and slower data caching.


The slowest test is stats_ext.sql - Not surprising, it does sequential scans
of tables with ~1000-10000 rows over and over again.  I don't see why it has
to do that with as many rows as it does.



Another thing we spend a decent amount of time, distributed over many places,
is FROM generate_series(x, y), which ends up slow due to nodeFunctionscan.c
always using a tuplestore.


Code wise, the most immediately noticeable things are
1) MEMORY_CONTEXT_CHECKING
2) AssertCheckRanges() (only in the brin test, but there a very large portion
   of the runtime)
3) verify_compact_attribute(), pretty spread around


Greetings,

Andres Freund

[1] which annoyingly generates a bunch of test failures due to "Query
Identifier" getting added to tests

Re: Test timings are increasing too fast for cfbot

Reply via email to