Re: [HACKERS] Failback to old master

Maeldron T. Wed, 29 Oct 2014 09:44:08 -0700

Thank you, Robert.

I thought that removing the recovery.conf file makes the slave master only
after the slave was restarted. (Unlike creating the a trigger_file). Isn't
this true?


I also thought that if there was a crash on the original master and it
applied WAL entries on itself that are not presented on the slave then it
will throw an error when I try to connect it to the new master (to the old
slave).

It would be nice to know as creating a base_backup takes much time.

As for the other case, when there was no crash, safe swapping the master
and the slave two times without creating base_backups makes the upgrading
of the OS much easier (with only a couple of seconds down-time).

I am afraid to try on until production someone confirms that it's safe. I
seems to work though (but I don't like to bet).

M.

2014-10-29 15:41 GMT+01:00 Robert Haas <robertmh...@gmail.com>:

> On Wed, Oct 29, 2014 at 6:21 AM, Maeldron T. <maeld...@gmail.com> wrote:
> > I swear I have read a couple of old threads. Yet I am not sure if it
> safe to
> > failback to the old master in case of async replication without base
> backup.
> >
> > Considering:
> > I have the latest 9.3 server
> > A: master
> > B: slave
> > B is actively connected to A
> >
> > I shut down A manually with -m fast (it's the default FreeBSD init script
> > setting)
> > I remove the recovery.conf from B
> > I restart B
> > I create a recovery.conf on A
> > I start A
> > I see nothing wrong in the logs
> > I go for a lunch
> > I shut down B
> > I remove the recovery.conf on AI restart A
> > I restore the recovery.conf on B
> > I start B
> > I see nothing wrong in the logs and I see that replication is working
> >
> > Can I say that my data is safe in this case?
> >
> > If the answer is yes, is it safe to do this if there was a power outage
> on A
> > instead of manual shutdown? Considering that the log says nothing wrong.
> (Of
> > course if it complains I'd do base backup from B).
>
> The threshold question here is whether the original master might have
> written (and thus, perhaps, applied) write-ahead log records that were
> not replayed on the slave.  If A crashed, that is definitely possible,
> so this is definitely not safe.  If A was shut down cleanly, then
> streaming replication *should* take everything up through the shutdown
> checkpoint and replicate those to the standby, which *should* replay
> them.  If all goes according to plan, I think this will work.
>
> I'm not sure we really have enough safeties to make this robust,
> though: for example, at the point when the shutdown checkpoint is
> written, I believe that the master is no longer accepting new
> connections - so if the connection to the slave is broken before the
> shutdown checkpoint record is replicated, then it's not safe any more,
> but how will we detect that?  And, if you remove recovery.conf on the
> slave, it will abort replay and enter normal running as soon as it
> reaches what it thinks is end-of-WAL, with no cross-check to make sure
> that's really the same was point that the master was actually at.  So
> it strikes me that it might be quite difficult to really have
> confidence that nothing will go wrong.
>
> I'm definitely not the expert in this area on this mailing list, so
> I'm curious what others think.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Failback to old master

Reply via email to