Hi all,

While working with pg_rewind, I noticed that it can sometimes request a
rewind even when no actual changes exist after a failover.

*Problem:*
Currently, pg_rewind determines the end-of-WAL on the target by using the
last shutdown checkpoint (or minRecoveryPoint for a standby). This creates
a false positive scenario:

1)Suppose a standby is promoted to become the new primary.
2)Later, the old primary is cleanly shut down.
3)The only WAL record generated on the old primary after divergence is a
shutdown checkpoint.

At this point, the old primary and new primary contain identical data.
However, since the shutdown checkpoint extends the WAL past the divergence
point, pg_rewind concludes:

if (target_wal_endrec > divergerec)
    rewind_needed = true;

That forces a rewind even though there are no meaningful changes.

To *reproduce this scenario* use the below attached script.

*Fix:*
The attached patch changes the logic so that pg_rewind no longer treats
shutdown checkpoints as meaningful records when determining the end-of-WAL.
Instead, we scan backward from the last checkpoint until we find the most
recent valid WAL record that is not a shutdown-only related record.

This ensures rewind is only triggered when there are actual modifications
after divergence, avoiding unnecessary rewinds in clean failover scenarios.


-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
#!/bin/bash

# Paths
BASE=$HOME
PRIMARY=$BASE/primary
REPLICA=$BASE/replica
# Locate Postgres binaries automatically from PATH
BIN=$(pg_config --bindir)

# Clean old dirs
rm -rf $PRIMARY $REPLICA $BASE/archive

# Init primary
$BIN/initdb -D $PRIMARY
echo "archive_mode = on" >> $PRIMARY/postgresql.conf
echo "archive_command = 'cp %p $BASE/archive/%f'" >> $PRIMARY/postgresql.conf
mkdir -p $BASE/archive
$BIN/pg_ctl -D $PRIMARY -l $PRIMARY/logfile start

# Take base backup for replica
$BIN/pg_basebackup -D $REPLICA -h localhost -p 5432 -P -Xs -R

# Change port of replica to 5433
echo "port = 5433" >> $REPLICA/postgresql.conf

$BIN/pg_ctl -D $REPLICA -l $REPLICA/logfile start

# Promote replica (becomes new primary)
$BIN/pg_ctl -D $REPLICA -l $REPLICA/logfile promote

# Stop both clusters
$BIN/pg_ctl -D $PRIMARY -l $PRIMARY/logfile -m immediate stop
$BIN/pg_ctl -D $REPLICA -l $REPLICA/logfile stop

echo "restore_command = 'cp $BASE/archive/%f %p'" >> $PRIMARY/postgresql.conf

# Run pg_rewind (old primary -> follow new primary)
$BIN/pg_rewind --progress --debug -c --source-pgdata=$REPLICA --target-pgdata=$PRIMARY

Attachment: v1-0001-pg_rewind-ignore-shutdown-only-WAL-when-determining-.patch
Description: Binary data

Reply via email to