On Tue, Jan 27, 2026 at 11:37 AM Andres Freund <[email protected]> wrote: > > But I wonder if you can't rename("old", "new") where "new" is a file that > > has already been unlinked (or renamed over) that someone still holds open, > > or something like that... > > I don't see a source of that that would be specific to this test though :(. We > do wait for pg_basebackup to have shut down, which wrote backup.label (which > was "manifactured" during streaming by basebackup.c). > > Perhaps we should crank up log level in the test? No idea if it'll help, but > right now I don't even know where to start looking.
I tried sticking a pg_sleep(30) in just before starting the standby node, and that didn't help, so it doesn't seem like it's a race condition. Here's what the standby log file looks like with log_min_messages=DEBUG2: 2026-01-27 17:19:25.262 GMT postmaster[4932] DEBUG: registering background worker "logical replication launcher" 2026-01-27 17:19:25.264 GMT postmaster[4932] DEBUG: dynamic shared memory system will support 229 segments 2026-01-27 17:19:25.264 GMT postmaster[4932] DEBUG: created dynamic shared memory control segment 3769552926 (9176 bytes) 2026-01-27 17:19:25.266 GMT postmaster[4932] DEBUG: max_safe_fds = 990, usable_fds = 1000, already_open = 3 2026-01-27 17:19:25.268 GMT postmaster[4932] LOG: starting PostgreSQL 19devel on x86_64-windows, compiled by msvc-19.29.30159, 64-bit 2026-01-27 17:19:25.271 GMT postmaster[4932] LOG: listening on Unix socket "C:/Windows/TEMP/3xesO1s4ba/.s.PGSQL.17575" 2026-01-27 17:19:25.273 GMT postmaster[4932] DEBUG: updating PMState from PM_INIT to PM_STARTUP 2026-01-27 17:19:25.273 GMT postmaster[4932] DEBUG: assigned pm child slot 57 for io worker 2026-01-27 17:19:25.275 GMT postmaster[4932] DEBUG: assigned pm child slot 58 for io worker 2026-01-27 17:19:25.277 GMT postmaster[4932] DEBUG: assigned pm child slot 59 for io worker 2026-01-27 17:19:25.278 GMT postmaster[4932] DEBUG: assigned pm child slot 56 for checkpointer 2026-01-27 17:19:25.280 GMT postmaster[4932] DEBUG: assigned pm child slot 55 for background writer 2026-01-27 17:19:25.281 GMT postmaster[4932] DEBUG: assigned pm child slot 89 for startup 2026-01-27 17:19:25.308 GMT checkpointer[6560] DEBUG: checkpointer updated shared memory configuration values 2026-01-27 17:19:25.314 GMT startup[2488] LOG: database system was interrupted; last known up at 2026-01-27 17:19:21 GMT 2026-01-27 17:19:25.317 GMT startup[2488] DEBUG: removing all temporary WAL segments The system cannot find the file specified. 2026-01-27 17:19:25.336 GMT startup[2488] DEBUG: could not restore file "00000002.history" from archive: child process exited with exit code 1 2026-01-27 17:19:25.337 GMT startup[2488] DEBUG: backup time 2026-01-27 17:19:21 GMT in file "backup_label" 2026-01-27 17:19:25.337 GMT startup[2488] DEBUG: backup label pg_basebackup base backup in file "backup_label" 2026-01-27 17:19:25.337 GMT startup[2488] DEBUG: backup timeline 1 in file "backup_label" 2026-01-27 17:19:25.337 GMT startup[2488] LOG: starting backup recovery with redo LSN 0/2A000028, checkpoint LSN 0/2A000080, on timeline ID 1 The system cannot find the file specified. 2026-01-27 17:19:25.352 GMT startup[2488] DEBUG: could not restore file "00000001000000000000002A" from archive: child process exited with exit code 1 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: checkpoint record is at 0/2A000080 2026-01-27 17:19:25.353 GMT startup[2488] LOG: entering standby mode 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: redo record is at 0/2A000028; shutdown false 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: next transaction ID: 769; next OID: 24576 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: next MultiXactId: 1; next MultiXactOffset: 1 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: oldest unfrozen transaction ID: 760, in database 1 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: oldest MultiXactId: 1, in database 1 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: commit timestamp Xid oldest/newest: 0/0 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: transaction ID wrap limit is 2147484407, limited by database with OID 1 2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: starting up replication slots 2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: xmin required by slots: data 0, catalog 0 2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: starting up replication origin progress state 2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: didn't need to unlink permanent stats file "pg_stat/pgstat.stat" - didn't exist 2026-01-27 17:19:38.938 GMT startup[2488] FATAL: could not rename file "backup_label" to "backup_label.old": Permission denied 2026-01-27 17:19:38.983 GMT postmaster[4932] DEBUG: releasing pm child slot 89 2026-01-27 17:19:38.983 GMT postmaster[4932] LOG: startup process (PID 2488) exited with exit code 1 2026-01-27 17:19:38.983 GMT postmaster[4932] LOG: aborting startup due to startup process failure 2026-01-27 17:19:38.983 GMT postmaster[4932] DEBUG: cleaning up dynamic shared memory control segment with ID 3769552926 2026-01-27 17:19:38.985 GMT postmaster[4932] LOG: database system is shut down Unfortunately, I don't see any clues there. The "The system cannot find the file specified." messages look like they might be a clue, but I think they are not, because they also occur in 040_standby_failover_slots_sync_standby1.log, and that test passes. At the point where this log file shows the FATAL error, that log file continues thus: 2026-01-27 17:18:36.905 GMT startup[1420] DEBUG: resetting unlogged relations: cleanup 1 init 0 2026-01-27 17:18:36.906 GMT startup[1420] DEBUG: initializing for hot standby 2026-01-27 17:18:36.906 GMT startup[1420] LOG: redo starts at 0/02000028 2026-01-27 17:18:36.906 GMT startup[1420] DEBUG: recovery snapshots are now enabled 2026-01-27 17:18:36.906 GMT startup[1420] CONTEXT: WAL redo at 0/02000048 for Standby/RUNNING_XACTS: nextXid 769 latestCompletedXid 768 oldestRunningXid 769 2026-01-27 17:18:36.907 GMT startup[1420] DEBUG: end of backup record reached 2026-01-27 17:18:36.907 GMT startup[1420] CONTEXT: WAL redo at 0/02000100 for XLOG/BACKUP_END: 0/02000028 2026-01-27 17:18:36.907 GMT startup[1420] DEBUG: end of backup reached Which again seems totally normal. -- Robert Haas EDB: http://www.enterprisedb.com
