Thank you, Robert. I thought that removing the recovery.conf file makes the slave master only after the slave was restarted. (Unlike creating the a trigger_file). Isn't this true?
I also thought that if there was a crash on the original master and it applied WAL entries on itself that are not presented on the slave then it will throw an error when I try to connect it to the new master (to the old slave). It would be nice to know as creating a base_backup takes much time. As for the other case, when there was no crash, safe swapping the master and the slave two times without creating base_backups makes the upgrading of the OS much easier (with only a couple of seconds down-time). I am afraid to try on until production someone confirms that it's safe. I seems to work though (but I don't like to bet). M. 2014-10-29 15:41 GMT+01:00 Robert Haas <robertmh...@gmail.com>: > On Wed, Oct 29, 2014 at 6:21 AM, Maeldron T. <maeld...@gmail.com> wrote: > > I swear I have read a couple of old threads. Yet I am not sure if it > safe to > > failback to the old master in case of async replication without base > backup. > > > > Considering: > > I have the latest 9.3 server > > A: master > > B: slave > > B is actively connected to A > > > > I shut down A manually with -m fast (it's the default FreeBSD init script > > setting) > > I remove the recovery.conf from B > > I restart B > > I create a recovery.conf on A > > I start A > > I see nothing wrong in the logs > > I go for a lunch > > I shut down B > > I remove the recovery.conf on AI restart A > > I restore the recovery.conf on B > > I start B > > I see nothing wrong in the logs and I see that replication is working > > > > Can I say that my data is safe in this case? > > > > If the answer is yes, is it safe to do this if there was a power outage > on A > > instead of manual shutdown? Considering that the log says nothing wrong. > (Of > > course if it complains I'd do base backup from B). > > The threshold question here is whether the original master might have > written (and thus, perhaps, applied) write-ahead log records that were > not replayed on the slave. If A crashed, that is definitely possible, > so this is definitely not safe. If A was shut down cleanly, then > streaming replication *should* take everything up through the shutdown > checkpoint and replicate those to the standby, which *should* replay > them. If all goes according to plan, I think this will work. > > I'm not sure we really have enough safeties to make this robust, > though: for example, at the point when the shutdown checkpoint is > written, I believe that the master is no longer accepting new > connections - so if the connection to the slave is broken before the > shutdown checkpoint record is replicated, then it's not safe any more, > but how will we detect that? And, if you remove recovery.conf on the > slave, it will abort replay and enter normal running as soon as it > reaches what it thinks is end-of-WAL, with no cross-check to make sure > that's really the same was point that the master was actually at. So > it strikes me that it might be quite difficult to really have > confidence that nothing will go wrong. > > I'm definitely not the expert in this area on this mailing list, so > I'm curious what others think. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >