On 16/01/2026 08:00, Alexander Lakhin wrote:
03.01.2026 04:40, Tom Lane wrote:
In the past couple of days, scorpion and skink have failed
the nbtree_half_dead_pages test with identical symptoms [1][2]:
...
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl? nm=scorpion&dt=2026-01-02%2004%3A54%3A38 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl? nm=skink&dt=2025-12-31%2003%3A34%3A51

I reproduced such failures locally (when running multiple test
instances under Valgrind concurrently) and discovered that the test might
fail due to autovacuum activity. (Apparently because
heap_prune_satisfies_vacuum() returns HEAPTUPLE_RECENTLY_DEAD, not
HEAPTUPLE_DEAD for tuples in question, so prune_freeze_plan()/
heap_page_prune_and_freeze() finds 0 lpdead_items.)

pgsql.build/testrun/nbtree/regress/log/postmaster.log in [2] contains:
2025-12-31 06:00:41.778 CET autovacuum worker[2250984] LOG: automatic analyze of table "template1.information_schema.sql_features"

(The postmaster log is missing in [1] for some reason...)

I've also managed to reproduce this just with the attached patch and:
echo "autovacuum_naptime = 1" > /tmp/temp.config
TEMP_CONFIG=/tmp/temp.config make -s check -C src/test/modules/nbtree

ok 86        - nbtree_half_dead_pages                    319 ms
not ok 87    - nbtree_half_dead_pages                    324 ms
ok 88        - nbtree_half_dead_pages                    326 ms
...
# 1 of 101 tests failed.

Great, thanks! I was able to readily reproduce it by adding a delay to auto-analyze (you still need to run it around 5 times in a row, for the auto-analyze to kick):

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index aa4fbec143f..4f91ce84786 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -645,6 +645,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
                                        StartTransactionCommand();
                                        /* functions in indexes may want a 
snapshot set */
                                        
PushActiveSnapshot(GetTransactionSnapshot());
+                                       if (AmAutoVacuumWorkerProcess())
+                                               pg_usleep(1000000);
                                }

                                analyze_rel(vrel->oid, vrel->relation, params,

Pushed a fix using a little helper procedure to wait for snapshots holding back the vacuum horizon to finish. It's the same approach as in the syscache-update-pruned test.

- Heikki



Reply via email to