On 16/01/2026 08:00, Alexander Lakhin wrote:
03.01.2026 04:40, Tom Lane wrote:
In the past couple of days, scorpion and skink have failed
the nbtree_half_dead_pages test with identical symptoms [1][2]:
...
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?
nm=scorpion&dt=2026-01-02%2004%3A54%3A38
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?
nm=skink&dt=2025-12-31%2003%3A34%3A51
I reproduced such failures locally (when running multiple test
instances under Valgrind concurrently) and discovered that the test might
fail due to autovacuum activity. (Apparently because
heap_prune_satisfies_vacuum() returns HEAPTUPLE_RECENTLY_DEAD, not
HEAPTUPLE_DEAD for tuples in question, so prune_freeze_plan()/
heap_page_prune_and_freeze() finds 0 lpdead_items.)
pgsql.build/testrun/nbtree/regress/log/postmaster.log in [2] contains:
2025-12-31 06:00:41.778 CET autovacuum worker[2250984] LOG: automatic
analyze of table "template1.information_schema.sql_features"
(The postmaster log is missing in [1] for some reason...)
I've also managed to reproduce this just with the attached patch and:
echo "autovacuum_naptime = 1" > /tmp/temp.config
TEMP_CONFIG=/tmp/temp.config make -s check -C src/test/modules/nbtree
ok 86 - nbtree_half_dead_pages 319 ms
not ok 87 - nbtree_half_dead_pages 324 ms
ok 88 - nbtree_half_dead_pages 326 ms
...
# 1 of 101 tests failed.
Great, thanks! I was able to readily reproduce it by adding a delay to
auto-analyze (you still need to run it around 5 times in a row, for the
auto-analyze to kick):
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index aa4fbec143f..4f91ce84786 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -645,6 +645,8 @@ vacuum(List *relations, const VacuumParams params,
BufferAccessStrategy bstrateg
StartTransactionCommand();
/* functions in indexes may want a
snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
+ if (AmAutoVacuumWorkerProcess())
+ pg_usleep(1000000);
}
analyze_rel(vrel->oid, vrel->relation, params,
Pushed a fix using a little helper procedure to wait for snapshots
holding back the vacuum horizon to finish. It's the same approach as in
the syscache-update-pruned test.
- Heikki