On Mon, Jan 18, 2010 at 11:42 PM, Simon Riggs <[email protected]> wrote:
> On Mon, 2010-01-18 at 09:31 -0500, Tom Lane wrote:
>> Fujii Masao <[email protected]> writes:
>> > When I configured a cascaded standby (i.e, made the additional
>> > standby server connect to the standby), I got the following
>> > errors, and a cascaded standby didn't start replication.
>>
>> > ERROR: timeline 0 of the primary does not match recovery target
>> > timeline 1
>>
>> > I didn't care about that case so far. To avoid a confusing error
>> > message, we should forbid a startup of walsender during recovery,
>> > and emit a suitable message? Or support such cascade-configuration?
>> > Though I don't think that the latter is difficult to be implemented,
>> > ISTM it's not the time to do that now.
>>
>> It would be kind of silly to add code to forbid it if making it work
>> would be about the same amount of effort. I think it'd be worth looking
>> closer to find out what the problem is.
>
> There is an ERROR, but no problem AFAICS. The tli isn't set until end of
> recovery because it doesn't need to have been set yet. That shouldn't
> prevent retrieving WAL data.
OK. Here is the patch which supports a walsender process during recovery;
* Change walsender so as to send the WAL written by the walreceiver
if it has been started during recovery.
* Kill the walsenders started during recovery at the end of recovery
because replication cannot survive the change of timeline ID.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 6384,6389 **** StartupXLOG(void)
--- 6384,6397 ----
xlogctl->SharedRecoveryInProgress = false;
SpinLockRelease(&xlogctl->info_lck);
}
+
+ /*
+ * Kill the walsender processes which were started during recovery
+ * since they cannot survive the change of timeline ID at the end of
+ * an archive recovery. Here is the right place to do that because
+ * new 'cascaded' walsender will not be started from here on.
+ */
+ ShutdownCascadedWalSnds();
}
/*
***************
*** 6666,6671 **** GetWriteRecPtr(void)
--- 6674,6682 ----
volatile XLogCtlData *xlogctl = XLogCtl;
XLogRecPtr recptr;
+ if (LocalRecoveryInProgress)
+ return GetWalRcvWriteRecPtr();
+
SpinLockAcquire(&xlogctl->info_lck);
recptr = xlogctl->LogwrtResult.Write;
SpinLockRelease(&xlogctl->info_lck);
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 491,496 **** InitWalSnd(void)
--- 491,505 ----
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg("sorry, too many standbys already")));
+ /*
+ * Use the recovery target timeline ID during recovery.
+ */
+ if (RecoveryInProgress())
+ {
+ MyWalSnd->cascaded = true;
+ ThisTimeLineID = GetRecoveryTargetTLI();
+ }
+
/* Arrange to clean up at walsender exit */
on_shmem_exit(WalSndKill, 0);
}
***************
*** 506,511 **** WalSndKill(int code, Datum arg)
--- 515,521 ----
* for this.
*/
MyWalSnd->pid = 0;
+ MyWalSnd->cascaded = false;
/* WalSnd struct isn't mine anymore */
MyWalSnd = NULL;
***************
*** 848,850 **** GetOldestWALSendPointer(void)
--- 858,880 ----
}
return oldest;
}
+
+ /*
+ * Stop only the cascaded walsender processes.
+ */
+ void
+ ShutdownCascadedWalSnds(void)
+ {
+ int i;
+
+ for (i = 0; i < MaxWalSenders; i++)
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+ pid_t walsndpid;
+
+ walsndpid = walsnd->pid;
+ if (walsndpid != 0 && walsnd->cascaded)
+ kill(walsndpid, SIGTERM);
+ }
+ }
*** a/src/include/replication/walsender.h
--- b/src/include/replication/walsender.h
***************
*** 22,27 **** typedef struct WalSnd
--- 22,28 ----
{
pid_t pid; /* this walsender's process id, or 0 */
XLogRecPtr sentPtr; /* WAL has been sent up to this point */
+ bool cascaded; /* this walsender is started during recovery? */
slock_t mutex; /* locks shared variables shown above */
} WalSnd;
***************
*** 45,49 **** extern void WalSndSignals(void);
--- 46,51 ----
extern Size WalSndShmemSize(void);
extern void WalSndShmemInit(void);
extern XLogRecPtr GetOldestWALSendPointer(void);
+ extern void ShutdownCascadedWalSnds(void);
#endif /* _WALSENDER_H */
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers