I like the idea of preventing promotion to avoid such failures -- it sounds reasonable.
However, we still have the problem: if the standby is stopped with non-replicated TLI 2, it will fail to start: "FATAL: according to history file, WAL location Y belongs to timeline X, but previous recovered WAL file came from timeline X+1". This happens even if no promotion is attempted — just a plain restart of the standby. So the issue isn’t only about when to allow promotion. Regarding my proposed solution: could you clarify why it isn’t correct? I’d appreciate more detail so I can address your concerns. --- Alena Vinter
2025-12-25 15:44:13.010 +07 postmaster[474660] LOG: listening on Unix socket "/tmp/QPQwr4NLnl/.s.PGSQL.28826" 2025-12-25 15:44:13.017 +07 startup[474666] LOG: database system was interrupted; last known up at 2025-12-25 15:44:10 +07 2025-12-25 15:44:14.596 +07 startup[474666] LOG: starting backup recovery with redo LSN 0/02000028, checkpoint LSN 0/02000080, on timeline ID 1 2025-12-25 15:44:14.597 +07 startup[474666] LOG: entering standby mode 2025-12-25 15:44:14.603 +07 startup[474666] LOG: redo starts at 0/02000028 on TLI 1 2025-12-25 15:44:14.605 +07 startup[474666] LOG: completed backup recovery with redo LSN 0/02000028 and end LSN 0/02000120 2025-12-25 15:44:14.605 +07 startup[474666] LOG: consistent recovery state reached at 0/02000120 2025-12-25 15:44:14.605 +07 postmaster[474660] LOG: database system is ready to accept read-only connections 2025-12-25 15:44:14.612 +07 walreceiver[474690] LOG: fetching timeline history file for timeline 2 from primary server 2025-12-25 15:44:14.617 +07 walreceiver[474690] LOG: started streaming WAL from primary at 0/03000000 on timeline 1 2025-12-25 15:44:14.639 +07 walreceiver[474690] LOG: replication terminated by primary server 2025-12-25 15:44:14.639 +07 walreceiver[474690] DETAIL: End of WAL reached on timeline 1 at 0/030B20E8. 2025-12-25 15:44:14.667 +07 startup[474666] LOG: new target timeline is 2 2025-12-25 15:44:14.667 +07 startup[474666] LOG: invalid record length at 0/030B20E8: expected at least 24, got 0 2025-12-25 15:44:14.667 +07 walreceiver[474690] LOG: restarted WAL streaming at 0/03000000 on timeline 2 2025-12-25 15:44:19.698 +07 postmaster[474660] LOG: received fast shutdown request 2025-12-25 15:44:19.704 +07 postmaster[474660] LOG: aborting any active transactions 2025-12-25 15:44:19.704 +07 walreceiver[474690] FATAL: terminating walreceiver process due to administrator command 2025-12-25 15:44:19.710 +07 checkpointer[474664] LOG: shutting down 2025-12-25 15:44:19.738 +07 postmaster[474660] LOG: database system is shut down 2025-12-25 15:44:19.839 +07 postmaster[474716] LOG: starting PostgreSQL 19devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20251111 (Red Hat 15.2.1-4), 64-bit 2025-12-25 15:44:19.839 +07 postmaster[474716] LOG: listening on Unix socket "/tmp/QPQwr4NLnl/.s.PGSQL.28826" 2025-12-25 15:44:19.845 +07 startup[474722] LOG: database system was shut down in recovery at 2025-12-25 15:44:19 +07 2025-12-25 15:44:19.845 +07 startup[474722] LOG: entering standby mode 2025-12-25 15:44:19.848 +07 startup[474722] LOG: redo starts at 0/02000028 on TLI 1 2025-12-25 15:44:19.850 +07 startup[474722] LOG: invalid magic number 0000 in WAL segment 000000020000000000000003, LSN 0/03020000, offset 131072 2025-12-25 15:44:19.850 +07 startup[474722] FATAL: according to history file, WAL location 0/0301FFD0 belongs to timeline 1, but previous recovered WAL file came from timeline 2 2025-12-25 15:44:19.855 +07 postmaster[474716] LOG: startup process (PID 474722) exited with exit code 1 2025-12-25 15:44:19.855 +07 postmaster[474716] LOG: terminating any other active server processes 2025-12-25 15:44:19.855 +07 postmaster[474716] LOG: shutting down due to startup process failure 2025-12-25 15:44:19.856 +07 postmaster[474716] LOG: database system is shut down
recovery_tli_switch_test_without_standby_promotion.pl
Description: Perl program
