Re: Test timings are increasing too fast for cfbot
On 2026-03-26 11:09:36 +1300, David Rowley wrote: > On Wed, 25 Mar 2026 at 16:15, Andres Freund wrote: > > Code wise, the most immediately noticeable things are > > > 3) verify_compact_attribute(), pretty spread around > > We do now have TupleDescFinalize(), where those could be checked just > once rather than on each call to TupleDescCompactAttr(). That's not > quite as watertight a guarantee as someone could change the > FormData_pg_attribute after TupleDescFinalize(). Just doing it in > TupleDescFinalize() would at least still catch the places where people > forget to call populate_compact_attribute() before > TupleDescFinalize(). Maybe verify_compact_attribute() could just do an assert comparison between the underlying non-compact attribute and the compact one? Or even just an assert checking if the compact attribute is initialized, with the full checking happening in TupleDescFinalize(), as you suggest? > I would have expected this to be a little less overhead now since > d8a859d22 removed the calls to TupleDescCompactAttr() in the main > deforming routine. Maybe I should just make that change in the other > deformers...? Might be worth it. > Do you have an idea of which callers of verify_compact_attribute() are > causing the most overhead? I'm not sure how much to believe the profile when the costs are as distributed as they are here. But according to the profile it's - heap_form_tuple - heap_form_minimal_tuple - index_getattr - nocachegetattr Greetings, Andres Freund
Re: Test timings are increasing too fast for cfbot
Hi, On 2026-03-25 12:36:12 +0800, Julien Rouhaud wrote: > On Tue, Mar 24, 2026 at 11:15:32PM -0400, Andres Freund wrote: > > > > A surprisingly large source of this is psql internal queries. I first ran > > the > > regression tests with pg_stat_statements activated [1] and noticed that a > > lot > > of the queries, by exec and or plan time, are psql queries. > > > > [1] which annoyingly generates a bunch of test failures due to "Query > > Identifier" getting added to tests > > FWIW `compute_query_id = regress` is supposed to avoid that problem. Hm. I guess that should work. Thanks! It's pretty annoying that one has to manually specify it to keep the tests passing. I wonder if we should make pg_regress specify it to avoid that. Greetings, Andres Freund
Re: Test timings are increasing too fast for cfbot
On Wed, 25 Mar 2026 at 16:15, Andres Freund wrote: > Code wise, the most immediately noticeable things are > 3) verify_compact_attribute(), pretty spread around We do now have TupleDescFinalize(), where those could be checked just once rather than on each call to TupleDescCompactAttr(). That's not quite as watertight a guarantee as someone could change the FormData_pg_attribute after TupleDescFinalize(). Just doing it in TupleDescFinalize() would at least still catch the places where people forget to call populate_compact_attribute() before TupleDescFinalize(). I would have expected this to be a little less overhead now since d8a859d22 removed the calls to TupleDescCompactAttr() in the main deforming routine. Maybe I should just make that change in the other deformers...? Do you have an idea of which callers of verify_compact_attribute() are causing the most overhead? David
Re: Test timings are increasing too fast for cfbot
Hi, On 2026-03-25 15:38:01 +0100, Tomas Vondra wrote: > On 3/25/26 04:15, Andres Freund wrote: > > ... > > > > The slowest test is stats_ext.sql - Not surprising, it does sequential scans > > of tables with ~1000-1 rows over and over again. I don't see why it has > > to do that with as many rows as it does. > > > > IIRC we needed to use non-trivial amounts of data to ensure building the > right right type of statistics (e.g. no MCV). But I understand it can be > annoyingly expensive, so I'll try to make it cheaper. Great. It might also be worth checking if any of the query conditions can be made cheaper without actually changing what they do test. > > ... > > 2) AssertCheckRanges() (only in the brin test, but there a very large > > portion > >of the runtime) > > True. It is a very comprehensive validation of the ranges, and it was > very useful - particularly during development. But I'll try to make it > more targeted at the stuff actually changed / called less often. Makes sense. There might also be some changs that make it faster without loosing any coverage. E.g. not using FunctionCall2Coll() - which initializes stuff on every call - but doing InitFunctionCallInfoData() once and then update arguments + FunctionCallInvoke() for each call. > Both changes will require time (so that we don't lose test coverage), > but I assume it's OK if that happens sometime after the feature freeze. Yea. We're not running out of credits tomorrow or something like that. Greetings, Andres Freund
Re: Test timings are increasing too fast for cfbot
On 3/25/26 04:15, Andres Freund wrote: > ... > > The slowest test is stats_ext.sql - Not surprising, it does sequential scans > of tables with ~1000-1 rows over and over again. I don't see why it has > to do that with as many rows as it does. > IIRC we needed to use non-trivial amounts of data to ensure building the right right type of statistics (e.g. no MCV). But I understand it can be annoyingly expensive, so I'll try to make it cheaper. > ... > 2) AssertCheckRanges() (only in the brin test, but there a very large portion >of the runtime) True. It is a very comprehensive validation of the ranges, and it was very useful - particularly during development. But I'll try to make it more targeted at the stuff actually changed / called less often. Both changes will require time (so that we don't lose test coverage), but I assume it's OK if that happens sometime after the feature freeze. regards -- Tomas Vondra
Re: Test timings are increasing too fast for cfbot
Hi, On Tue, Mar 24, 2026 at 11:15:32PM -0400, Andres Freund wrote: > > A surprisingly large source of this is psql internal queries. I first ran the > regression tests with pg_stat_statements activated [1] and noticed that a lot > of the queries, by exec and or plan time, are psql queries. > > [1] which annoyingly generates a bunch of test failures due to "Query > Identifier" getting added to tests FWIW `compute_query_id = regress` is supposed to avoid that problem.
Re: Test timings are increasing too fast for cfbot
Hi, On 2026-03-24 17:30:46 -0400, Andres Freund wrote: > Due to the number of times we run the main regression tests, they have a > particularly large effect on test resources. > > time cycles syscalls > 17 37.13239.2T 1.573M > 18 44.27 295.7T 1.715M > master 48.19 323.5T 1.854M A surprisingly large source of this is psql internal queries. I first ran the regression tests with pg_stat_statements activated [1] and noticed that a lot of the queries, by exec and or plan time, are psql queries. I patched psql to add "-- psql internal\n" to every internal query and then used log_min_duration_statement=0 to see how much of the time is spent in psql - ~13% of the reported duration of all queries. A lot of that is because describeTbleDetails() ends up with a sequential scan: regression[1020428][1]=# EXPLAIN ANALYZE SELECT c.oid, n.nspname, c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname OPERATOR(pg_catalog.~) '^(pg_class)$' COLLATE pg_catalog.default AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 2, 3; ┌──┐ │QUERY PLAN │ ├──┤ │ Sort (cost=324.79..324.80 rows=1 width=132) (actual time=7.192..7.194 rows=1.00 loops=1)│ │ Sort Key: n.nspname, c.relname │ │ Sort Method: quicksort Memory: 25kB │ │ Buffers: shared hit=234 │ │ -> Nested Loop Left Join (cost=0.00..324.78 rows=1 width=132) (actual time=2.489..7.121 rows=1.00 loops=1) │ │ Join Filter: (n.oid = c.relnamespace) │ │ Buffers: shared hit=231 │ │ -> Seq Scan on pg_class c (cost=0.00..313.79 rows=1 width=72) (actual time=2.473..7.105 rows=1.00 loops=1) │ │ Filter: ((relname ~ '^(pg_class)$'::text) AND pg_table_is_visible(oid))│ │ Rows Removed by Filter: 2260 │ │ Buffers: shared hit=229 │ │ -> Seq Scan on pg_namespace n (cost=0.00..8.22 rows=222 width=68) (actual time=0.011..0.011 rows=1.00 loops=1) │ │ Buffers: shared hit=2 │ │ Planning: │ │ Buffers: shared hit=6 │ │ Planning Time: 0.520 ms │ │ Execution Time: 7.265 ms │ └──┘ (17 rows) It seems decidedly not optimal that "\d tablename", without any patterns, ends up doing a seqscan. That's bad enough in the regression database, but there are many PG instances with many many entries in pg_class. I don't think this was always the case? If I remove the COLLATE pg_catalog.default, a sane plan is chosen. That's obviously not the right fix, but seemed interesting enough to mention. That alone seems makes a surprising 4.5% improvement in an unoptimized -O0 build. Another psql query that stands out is \d's query for publications. Not because of runtime, but because of the planning cost. Nuking that (obviously not correct), is worth another 3% of test time (and some test failues). I've not analyzed the psql queries any further, but I'd bet that the rest also do add up. The most expensive query in the entire regression tests is: DO $$ BEGIN FOR r IN 1..1350 LOOP DELETE FROM dedup_unique_test_table; INSERT INTO dedup_unique_test_table SELECT 1; END LOOP; END$$; which makes sense, as that's pretty clearly O(N^2). There are a surprisingly large number of temporary file creations in the regression tests. A
