On Wed, Mar 04, 2026 at 10:00:00AM +0200, Alexander Lakhin wrote:
> Yes, 012_subtransactions doesn't fail with aggressive bgwriter, as I noted
> before. I mentioned it exactly to show that stop does matter here. But if
> we recognize teardown_node in this context as risky, maybe it would make
> sense to review also other tests in recovery/. I already wrote about
> 004_timeline_switch, but probably there are more. E.g., 028_pitr_timelines
> (I haven't tested it intensively yet) does:
> $node_primary->stop('immediate');
>
> # Promote the standby, and switch WAL so that it archives a WAL segment
> # that contains all the INSERTs, on a new timeline.
> $node_standby->promote;I think that your take about 004 is actually right, looking at it more closely. By tearing down the primary, it could be possible that standby_2 receives more records than standby_1. Then, when we try to reconnect standby_2 to the promoted standby_1, the TLI could fork, in theory. The fix would be the same: by switching to stop(), we'd make sure that both standby_1 and standby_2 have received all the records from the primary. We can also remove the wait_for_catchup() before the primary is stopped, this offers no protection for standby_2 receiving more records from the primary than standby_1. It is not surprising that this failure with a three-node scenario is much harder to reproduce. I have run the same loop as 009 but things are super stable even after 50-ish iteractions. By reading the code, I agree that the failure is possible to reach in theory, though. Some hardcoded sleeps would do the trick (make bgwriter aggressive, patch the checkpointer so as we do not send the last standby snapshot records to standby_2, only to standby_1, etc.). Did you find any buildfarm failures involving 028? I cannot get excited in changing tests where nothing has happened, and this test looks OK as we don't do a switchover. For 004, we have at least one failure recorded based on what you said. That's a fact sufficient for me to fix things, for 004. -- Michael
signature.asc
Description: PGP signature
