Michael Paquier <mich...@paquier.xyz> writes: > Add TAP test for archive_cleanup_command and recovery_end_command
grassquit just showed a non-reproducible failure in this test [1]: # Postmaster PID for node "standby" is 291160 ok 1 - check content from archives not ok 2 - archive_cleanup_command executed on checkpoint # Failed test 'archive_cleanup_command executed on checkpoint' # at t/002_archiving.pl line 74. This test is sending a CHECKPOINT command to the standby and expecting it to run the archive_cleanup_command, but it looks like the standby did not actually run any checkpoint: 2022-04-07 16:11:33.060 UTC [291806][not initialized][:0] LOG: connection received: host=[local] 2022-04-07 16:11:33.078 UTC [291806][client backend][2/15:0] LOG: connection authorized: user=bf database=postgres application_name=002_archiving.pl 2022-04-07 16:11:33.084 UTC [291806][client backend][2/16:0] LOG: statement: CHECKPOINT 2022-04-07 16:11:33.092 UTC [291806][client backend][:0] LOG: disconnection: session time: 0:00:00.032 user=bf database=postgres host=[local] I am suspicious that the reason is that ProcessUtility does not ask for a forced checkpoint when in recovery: RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT | (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE)); The trouble with this theory is that this test has been there for nearly six months and this is the first such failure (I scraped the buildfarm logs to be sure). Seems like failures should be a lot more common than that. I wondered if the recent pg_stats changes could have affected this, but I don't really see how. Thoughts? regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2022-04-07%2015%3A45%3A48