Hi, In streaming replication, when we shutdown the master, walsender tries to send all the outstanding WAL records including the shutdown checkpoint record to the standby, and then to exit. This basically means that all the WAL records are fully synced between two servers after the clean shutdown of the master. So, after promoting the standby to new master, we can restart the stopped master as new standby without the need for a fresh backup from new master.
But there is one problem: though walsender tries to send all the outstanding WAL records, it doesn't wait for them to be replicated to the standby. IOW, walsender closes the replication connection as soon as it sends WAL records. Then, before receiving all the WAL records, walreceiver can detect the closure of connection and exit. We cannot guarantee that there is no missing WAL in the standby after clean shutdown of the master. In this case, backup from new master is required when restarting the stopped master as new standby. I have experienced this case several times, especially when enabling WAL archiving. The attached patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated to the standby before closing the replication connection. You may be concerned the case where the standby gets stuck and the walsender keeps waiting for the reply from that standby. In this case, wal_sender_timeout detects such inactive standby and then walsender ends. So even in that case, the shutdown can end. Thought? Regards, -- Fujii Masao
switchover_v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers