[HACKERS] Cascading replication and recovery_target_timeline='latest'

Heikki Linnakangas Fri, 31 Aug 2012 01:04:07 -0700

When a cascading standby launches a new walsender, it fetches thecurrent recovery timeline:


        /*
         * Use the recovery target timeline ID during recovery
         */
        if (am_cascading_walsender)
                ThisTimeLineID = GetRecoveryTargetTLI();


Comment in GetRecoveryTargetTLI() does this:

        /* RecoveryTargetTLI doesn't change so we need no lock to copy it */
        return XLogCtl->RecoveryTargetTLI;

That comment is not true. RecoveryTargetTLI can change during recovery,if you set recovery_target_timeline='latest'. In 'latest' mode, when the(apparent) end of WAL is reached, the archive is scanned for any newtimeline history files that may have appeared. If a new timeline isfound, RecoveryTargetTLI is updated, and recovery is continued on thenew timeline.

Aside from the missing locking, I wonder what that does to a cascadedstandby. If there is an active walsender running while RecoveryTargetTLIis changed, I think what will happen is that the walsender will continueto stream WAL from the old timeline, but because the startup process isnow actually replaying from a different timeline, the walsender willsend bogus WAL to the standby.

When a standby ends recovery, creates a new timeline, and switches tonormal operation, postmaster terminates all walsenders because of thetimeline change. But don't we have a race condition there, with similareffect? It might take a while for a walsender to die, and in thatwindow, it might send bogus WAL to the cascaded standby.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cascading replication and recovery_target_timeline='latest'

Reply via email to