Hi, On 2025-01-24 21:00:00 +0200, Alexander Lakhin wrote: > 24.01.2025 18:42, Tom Lane wrote: > > I realized just now that drongo has been intermittently failing like this: > > > > 147/256 postgresql:recovery / recovery/035_standby_logical_decoding > > ERROR 2116.16s (exit status 255 or signal 127 SIGinvalid) > > ------------------------------------- 8< > > ------------------------------------- > > stderr: > > # Failed test 'activeslot slot invalidation is logged with vacuum on > > pg_class' > > # at > > C:/prog/bf/root/REL_16_STABLE/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl > > line 229. > > # poll_query_until timed out executing this query: > > # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts > > where datname = 'testdb' > > # expecting this output: > > # t > > # last actual query output: > > # f > > # with stderr: > > # Failed test 'confl_active_logicalslot updated' > > # at > > C:/prog/bf/root/REL_16_STABLE/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl > > line 235. > > # Tests were run but no plan was declared and done_testing() was not seen. > > # Looks like your test exited with 255 just after 24. > > > > This has been happening for some time, in all three branches where > > that test script exists. The oldest failure that looks like that in > > the v16 branch is > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-09-06%2004%3A19%3A35 > > > > However, there are older failures showing a timeout of > > 035_standby_logical_decoding.pl that don't provide any detail, but > > might well be the same thing. The oldest one of those is from > > 2024-05-01, which is still considerably later than the test script > > itself (added on 2023-04-08). So it would seem that this is something > > we broke during 2024, rather than an aboriginal problem in this test. > > > > A search of the buildfarm logs did not turn up similar failures > > on any other animals. > > > > I have no idea how to proceed on narrowing down the cause... > > > > Please take a look at the list of such failures since 2024-06-01 I > collected here: > https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#035_standby_logical_decoding_standby.pl_fails_due_to_missing_activeslot_invalidation > > There is also a reference to a discussion of the failure there: > https://www.postgresql.org/message-id/657815a2-5a89-fcc1-1c9d-d77a6986b...@gmail.com > (In short, I observed that that test suffers from bgwriter's activity.)
I don't think it's quite right to blame this on bgwriter. E.g. a checkpoint will also emit XLOG_RUNNING_XACTS. The problem is that the test just is racy and that needs to be fixed. Greetings, Andres Freund