On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko <sawada.m...@gmail.com> wrote:
> Hi all,
> After switching primary serer while using repliaction slot, the
> standby server will not able to connect new primary server.
> Imagine this situation, if primary server has two ASYNC standby
> servers, also use each replication slots.
> And the one standby(A) apply WAL without problems. But another one
> standby(B) has stopped after connected to primary server.
> (or sending WAL is too delayed)
> In this situation, the standby(B) has not received WAL segment file
> while stopping itself.
> And the primary server can not remove WAL segments which has not been
> received to all standby.
> Therefore the primary server have to keep the WAL segment file which
> has not been received to all standby.
> But standby(A) can do checkpoint itself, and then it's possible to
> recycle WAL segments.
> The number of WAL segment of each server are different.
> ( The number of WAL files of standby(A) having smaller than primary server.)
> After the primary server is crashed, the standby(A) promote to primary,
> we can try to connect standby(B) to standby(A) as new standby server.
> But it will be failed because the standby(A) server might not have WAL
> segment files that standby(B) required.

This sounds valid concern.

> To resolve this situation, I think that we should make master server
> to notify about removal of WAL segment to all standby servers.
> And the standby servers recycle WAL segments files base on that information.
> Thought?

How does the server recycle WAL files after it's promoted from the
standby to master?
It does that as it likes? If yes, your approach would not be enough.

The approach prevents unexpected removal of WAL files while the standby
is running. But after the standby is promoted to master, it might recycle
needed WAL files immediately. So another standby may still fail to retrieve
the required WAL file after the promotion.

ISTM that, in order to address this, we might need to log all the replication
slot activities and replicate them to the standby. I'm not sure if this
breaks the design of replication slot at all, though.


Fujii Masao

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to