Hello, Alex. At Fri, 26 Aug 2022 10:57:25 +0200, Alexander Kukushkin <cyberd...@gmail.com> wrote in > On Fri, 26 Aug 2022 at 10:04, Kyotaro Horiguchi <horikyota....@gmail.com> > wrote: > > What I don't still understand is why pg_rewind doesn't work for the > > old primary in that case. When archive_mode=on, the old primary has > > the complete set of WAL files counting both pg_wal and its archive. So > > as the same to the privious repro, pg_rewind -c ought to work (but it > > uses its own archive this time). In that sense the proposed solution > > is still not needed in this case. > > > > The pg_rewind finishes successfully. But as a result it removes some files > from pg_wal that are required to perform recovery because they are missing > on the new primary.
IFAIS pg_rewind doesn't. -c option contrarily restores the all segments after the last (common) checkpoint and all of them are left alone after pg_rewind finishes. postgres itself removes the WAL files after recovery. After-promotion cleanup and checkpoint revmoes the files on the previous timeline. Before pg_rewind runs in the repro below, the old primary has the following segments. TLI1: 2 8 9 A B C D Just after pg_rewind finishes, the old primary has the following segments. TLI1: 2 3 5 6 7 TLI2: 4 (and 00000002.history) pg_rewind copied 1-2 to 1-3 and 2-4 and history file from the new primary, 1-4 to 1-7 from archive. After rewind finished, 1-4,1-8 to 1-D have been removed since the new primary didn't have them. Recovery starts from 1-3 and promotes at 0/4_000000. postgres removes 1-5 to 1-7 by post-promotion cleanup and removes 1-2 to 1-4 by a restartpoint. All of the segments are useless after the old primary promotes. When the old primary starts, it uses 1-3 and 2-4 for recovery and fails to fetch 2-5 from the new primary. But it is not an issue of pg_rewind at all. > > A bit harder situation comes after the server successfully rewound; if > > the new primary goes so far that the old primary cannot connect. Even > > in that case, you can copy-in the requried WAL files or configure > > restore_command of the old pimary so that it finds required WAL files > > there. > > > > Yes, we can do the backup of pg_wal before running pg_rewind, but it feels So, if I understand you correctly, the issue you are complaining is not about the WAL segments on the old timeline but about those on the new timeline, which don't have a business with what pg_rewind does. As the same with the case of pg_basebackup, the missing segments need to be somehow copied from the new primary since the old primary never had the chance to have them before. > very ugly, because we will also have to clean this "backup" after a > successful recovery. What do you mean by the "backup" here? Concretely what WAL segments do you feel need to remove, for example, in the repro case? Or, could you show your issue by something like the repro below? > It would be much better if pg_rewind didn't remove WAL files between the > last common checkpoint and diverged LSN in the first place. Thus I don't follow this.. regards. (Fixed a bug and slightly modified) ==== # killall -9 postgres # rm -r oldprim newprim oldarch newarch oldprim.log newprim.log mkdir newarch oldarch initdb -k -D oldprim echo "archive_mode = 'on'">> oldprim/postgresql.conf echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/oldarch/%f'">> oldprim/postgresql.conf pg_ctl -D oldprim -o '-p 5432' -l oldprim.log start psql -p 5432 -c 'create table t(a int)' pg_basebackup -D newprim -p 5432 echo "primary_conninfo='host=/tmp port=5432'">> newprim/postgresql.conf echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/newarch/%f'">> newprim/postgresql.conf touch newprim/standby.signal pg_ctl -D newprim -o '-p 5433' -l newprim.log start # the last common checkpoint psql -p 5432 -c 'checkpoint' # record approx. diverging WAL segment start_wal=`psql -p 5433 -Atc "select pg_walfile_name(pg_last_wal_replay_lsn() - (select setting from pg_settings where name = 'wal_segment_size')::int);"` psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();' pg_ctl -D newprim promote # old rprimary loses diverging WAL segment for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done psql -p 5432 -c 'checkpoint;' psql -p 5433 -c 'checkpoint;' # old primary cannot archive any more echo "archive_command = 'false'">> oldprim/postgresql.conf pg_ctl -D oldprim reload pg_ctl -D oldprim stop # rewind the old primary, using its own archive # pg_rewind -D oldprim --source-server='port=5433' # should fail echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/oldarch/%f %p'">> oldprim/postgresql.conf pg_rewind -D oldprim --source-server='port=5433' -c # advance WAL on the old primary; new primary loses the launching WAL seg for i in $(seq 1 4); do psql -p 5433 -c 'insert into t values(0); select pg_switch_wal();'; done psql -p 5433 -c 'checkpoint' echo "primary_conninfo='host=/tmp port=5433'">> oldprim/postgresql.conf touch oldprim/standby.signal postgres -D oldprim # fails with "WAL file has been removed" # The alternative of copying-in # echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f %p'">> oldprim/postgresql.conf # copy-in WAL files from new primary's archive to old primary (cd newarch; for f in `ls`; do if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal; fi done) postgres -D oldprim ==== -- Kyotaro Horiguchi NTT Open Source Software Center