Hi all, While working with pg_rewind, I noticed that it can sometimes request a rewind even when no actual changes exist after a failover.
*Problem:* Currently, pg_rewind determines the end-of-WAL on the target by using the last shutdown checkpoint (or minRecoveryPoint for a standby). This creates a false positive scenario: 1)Suppose a standby is promoted to become the new primary. 2)Later, the old primary is cleanly shut down. 3)The only WAL record generated on the old primary after divergence is a shutdown checkpoint. At this point, the old primary and new primary contain identical data. However, since the shutdown checkpoint extends the WAL past the divergence point, pg_rewind concludes: if (target_wal_endrec > divergerec) rewind_needed = true; That forces a rewind even though there are no meaningful changes. To *reproduce this scenario* use the below attached script. *Fix:* The attached patch changes the logic so that pg_rewind no longer treats shutdown checkpoints as meaningful records when determining the end-of-WAL. Instead, we scan backward from the last checkpoint until we find the most recent valid WAL record that is not a shutdown-only related record. This ensures rewind is only triggered when there are actual modifications after divergence, avoiding unnecessary rewinds in clean failover scenarios. -- Thanks, Srinath Reddy Sadipiralla EDB: https://www.enterprisedb.com/
#!/bin/bash # Paths BASE=$HOME PRIMARY=$BASE/primary REPLICA=$BASE/replica # Locate Postgres binaries automatically from PATH BIN=$(pg_config --bindir) # Clean old dirs rm -rf $PRIMARY $REPLICA $BASE/archive # Init primary $BIN/initdb -D $PRIMARY echo "archive_mode = on" >> $PRIMARY/postgresql.conf echo "archive_command = 'cp %p $BASE/archive/%f'" >> $PRIMARY/postgresql.conf mkdir -p $BASE/archive $BIN/pg_ctl -D $PRIMARY -l $PRIMARY/logfile start # Take base backup for replica $BIN/pg_basebackup -D $REPLICA -h localhost -p 5432 -P -Xs -R # Change port of replica to 5433 echo "port = 5433" >> $REPLICA/postgresql.conf $BIN/pg_ctl -D $REPLICA -l $REPLICA/logfile start # Promote replica (becomes new primary) $BIN/pg_ctl -D $REPLICA -l $REPLICA/logfile promote # Stop both clusters $BIN/pg_ctl -D $PRIMARY -l $PRIMARY/logfile -m immediate stop $BIN/pg_ctl -D $REPLICA -l $REPLICA/logfile stop echo "restore_command = 'cp $BASE/archive/%f %p'" >> $PRIMARY/postgresql.conf # Run pg_rewind (old primary -> follow new primary) $BIN/pg_rewind --progress --debug -c --source-pgdata=$REPLICA --target-pgdata=$PRIMARY
v1-0001-pg_rewind-ignore-shutdown-only-WAL-when-determining-.patch
Description: Binary data