Hi,

On 2025-01-24 21:00:00 +0200, Alexander Lakhin wrote:
> 24.01.2025 18:42, Tom Lane wrote:
> > I realized just now that drongo has been intermittently failing like this:
> > 
> > 147/256 postgresql:recovery / recovery/035_standby_logical_decoding         
> >       ERROR          2116.16s   (exit status 255 or signal 127 SIGinvalid)
> > ------------------------------------- 8< 
> > -------------------------------------
> > stderr:
> > #   Failed test 'activeslot slot invalidation is logged with vacuum on 
> > pg_class'
> > #   at 
> > C:/prog/bf/root/REL_16_STABLE/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl
> >  line 229.
> > # poll_query_until timed out executing this query:
> > # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts 
> > where datname = 'testdb'
> > # expecting this output:
> > # t
> > # last actual query output:
> > # f
> > # with stderr:
> > #   Failed test 'confl_active_logicalslot updated'
> > #   at 
> > C:/prog/bf/root/REL_16_STABLE/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl
> >  line 235.
> > # Tests were run but no plan was declared and done_testing() was not seen.
> > # Looks like your test exited with 255 just after 24.
> > 
> > This has been happening for some time, in all three branches where
> > that test script exists.  The oldest failure that looks like that in
> > the v16 branch is
> > 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-09-06%2004%3A19%3A35
> > 
> > However, there are older failures showing a timeout of
> > 035_standby_logical_decoding.pl that don't provide any detail, but
> > might well be the same thing.  The oldest one of those is from
> > 2024-05-01, which is still considerably later than the test script
> > itself (added on 2023-04-08).  So it would seem that this is something
> > we broke during 2024, rather than an aboriginal problem in this test.
> > 
> > A search of the buildfarm logs did not turn up similar failures
> > on any other animals.
> > 
> > I have no idea how to proceed on narrowing down the cause...
> > 
> 
> Please take a look at the list of such failures since 2024-06-01 I
> collected here:
> https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#035_standby_logical_decoding_standby.pl_fails_due_to_missing_activeslot_invalidation
> 
> There is also a reference to a discussion of the failure there:
> https://www.postgresql.org/message-id/657815a2-5a89-fcc1-1c9d-d77a6986b...@gmail.com
> (In short, I observed that that test suffers from bgwriter's activity.)

I don't think it's quite right to blame this on bgwriter. E.g. a checkpoint
will also emit XLOG_RUNNING_XACTS. The problem is that the test just is racy
and that needs to be fixed.

Greetings,

Andres Freund


Reply via email to