Re: [HACKERS] Tracking latest timeline in standby mode

Fujii Masao Mon, 01 Nov 2010 03:33:20 -0700

On Wed, Oct 27, 2010 at 11:42 PM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com> wrote:
> At the moment, when you specify recovery_target_timeline='latest', we scan
> for the latest timeline at the beginning of recovery, and pick that as the
> target. If new timelines appear during recovery, we stick to the target
> chosen in the beginning, the new timelines are ignored. That's undesirable
> if you have one master and two standby servers, and failover happens to one
> of the standbys. The other standby won't automatically start tracking the
> new TLI created by the promoted new master, it requires a restart to notice.
>
> This was discussed a while ago:
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg00620.php
>
> More work needs to be done to make that work over streaming replication,
> sending history files over the wire, for example, but let's take baby steps.
> At the very minimum the startup process should notice new timelines
> appearing in the archive. The attached patch does that.
>
> Comments?


Currently the startup process rescans the timeline history file only
when walreceiver
is not in progress. But, if walreceiver receives that file from the
master in the future,
the startup process should rescan them even while walreceiver is in progress?

> A related issue is that we should have a check for the issue I also
> mentioned in the comments:
>
>>        /*
>>         * If the current timeline is not part of the history of the
>>         * new timeline, we cannot proceed to it.
>>         *
>>         * XXX This isn't foolproof: The new timeline might have forked
>> from
>>         * the current one, but before the current recovery location. In
>> that
>>         * case we will still switch to the new timeline and proceed
>> replaying
>>         * from it even though the history doesn't match what we already
>>         * replayed. That's not good. We will likely notice at the next
>> online
>>         * checkpoint, as the TLI won't match what we expected, but it's
>>         * not guaranteed. The admin needs to make sure that doesn't
>> happen.
>>         */
>
> but that's a pre-existing and orthogonal issue, it can with the current code
> too if you restart the standby, so let's handle that as a separate patch.

I'm thinking to write the timeline switch LSN to the timeline history file, and
compare LSN with the location of the last applied WAL record when that
file is rescaned. If the timeline switch LSN is ahead, we cannot do the switch.

Currently the timeline history file contains the timeline switch WAL filename,
but it's not used at all. As a first step, what about replacing that
filename with
the switch LSN?

+                       /* Switch target */
+                       recoveryTargetTLI = newtarget;
+                       expectedTLIs = newExpectedTLIs;

Before "expectedTLIs = newExpectedTLIs", we should call
list_free_deep(expectedTLIs)?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tracking latest timeline in standby mode

Reply via email to